Journal of 


Experimental Psychology 


Vou. 29, No. 5 NoveMBER, 1941 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE UPON 
THREE LEVELS OF ASPIRATION * 


BY MALCOLM G. PRESTON AND JAMES A. BAYTON 


University of Pennsyloania 


INTRODUCTION 


Experiments on the level of aspiration thus far reported have in 
common the fact that they require the S to estimate in advance what 
he expects to do on succeeding trials. That instructions, however 
well phrased, may be interpreted differently by different Ss has not 
gone unrecognized. Gould (5), for example, considered carefully 
whether the manner in which the Ss interpreted the question “‘ What 
will you do next time?” was responsible for the results furnished her 
by her experiment. Again, Gould and Kaplan (7) mention in their 
investigation of the relationship of level of aspiration to academic 
and personality factors that it is important that the estimates be 
“controlled.” The same view is expressed by Hertzman and Fest- 
inger (9). That different instructions will produce different numer- 
ical results has been shown by Irwin and Goldich (11). ‘These 
investigators instructed some of their Ss to give an estimate of what 
they hoped to do on future trials while other Ss were instructed to 
give an estimate of what they thought they really would do. The 
distinction between the intellectual judgment, on the one hand, and 
the hope judgment, on the other, produced different numerical results 
in the two groups. A somewhat similar notion was exploited by 
\icGehee (13), who required an observer as well as a performer to 
estimate the future performance of the performer. Differences 
existing between the two sets of estimates were used by McGehee 
to distinguish between judgment, which was taken to be relatively 


* The investigators desire to express their appreciation to Professor F. C. Sumner and 
Mr. Thomas Hawkins of Howard University through whose codperation this study was made 
possible. 


351 


Lint 


"2 


Rives 
ad 


352 MALCOLM G. PRESTON AND JAMES A. BAYTON 


independent of the ego of the observer, and level of aspiration, which 
according to McGehee is closely associated with the performer’s ego. 

Although the problem of the instructions offered the S heretofore 
has been regarded as bearing chiefly upon the proper formulation of 
methods, the problem is also related to the theory of the level of 
aspiration, a fact which becomes apparent if the concept of the leve! 
of aspiration be compared with the concept of the psychophysical 
threshold. While it is a universally accepted convention that the 
threshold is that value of stimulus which produces a critical effect in 
50 percent of the stimulus applications, psychophysicists have not 
infrequently found it significant to study points other than the 50 
percent point, and indeed parameters of the function other than the 
percentile points. That important differences may exist between 
psychometric functions exhibiting the same stimulus value at the 
50 percent point has been well-known for a long time (8). Two 
functions yielding the same threshold value for example may differ 
enormously in respect of their precisions. Just as the threshold of the 
psychometric function may be distinguished from the precision, so 
may the level of aspiration as conventionally defined by the instruc- 
tions offered the Ss be distinguished from the range of performance 
values through which the Ss may be graduated on the basis of their 
aspirations. ‘To bring point to this distinction, we may consider a 
golfer who plays consistently in the nineties but who at the same time 
may hope to make an 85, be well satisfied with himself if he makes an 
89, actually expects to make a g1, may be badly disappointed if he 
fails to break 100 and would expect to give up golf if he came in with 
120. That the latter attitude is a real one is indicated by the fact 
that golfers break their clubs, throw them away, and frequently 
exhibit temper tantrums if their scores rise too far above a level ac- 
ceptable to their ego. It follows that if it is meaningful to consider 
an aspiration range it is meaningful to consider any point on that 
range. Thus it is meaningful to raise the now classic problem of 
generality of the aspiration phenomenon over a series of dissimilar 
tasks in connection with the level defined by “‘the least you will do 
and still maintain your self-respect”? or the level defined by the 
instruction “‘the most that you will ever do with everything favorable 
to the outcome.” 

The argument just made has been anticipated in the literature of 
the problem by Kneeland (12) who also recognized the methodological 
difficulties arising out of a failure to distinguish between the various 
levels of aspiration. Kneeland defined the ranges much as they were 
defined in this experiment but found little practical merit in the 
differentiation of the levels. Her conclusions we regard as resting 
upon an inadequate attack upon the problem. 


/ 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 353 


There are no reasons a priori which suggest the conclusion that 
each of the many levels which could be defined would react in the 
same way to the various forces already known to affect the level 
conventionally defined. On the contrary such a conclusion could be 
based only on an investigation of the question. From such an in- 
vestigation would flow a number of useful findings. If the experi- 
ment indicated that different levels respond differently to experi- 
mental variation light would be thrown upon the source of the con- 
flict between various investigators who have studied the question of 
generality of the aspiration phenomenon over a range of performances 
on dissimilar tasks. Again, the finding that differing levels of 
aspiration react differently under experimental variation would raise 
important questions concerning the theory of the level of aspiration. 
Such a finding would indicate a much greater complexity of function 
of the level of aspiration than has hitherto been granted on the basis 
of experimental evidence. In this respect it would support the view 
originally advanced by Frank (3) but criticised severely by Gould (5). 

The present experiment is concerned with the effect of a social 
variable upon three levels of aspiration: (1) the best that the Ss 
expected to do, (2) the performance actually expected, and (3) the 
least they expected to do. From these three levels may be calcu- 
lated three ranges. As subjects Negro College students were used. 
As a source of pressure upon the estimates the investigators used the 
device of telling the Ss that white students had reached a certain 
level of proficiency in the tasks used. To the data taken the follow- 
ing questions were addressed: 


1. What are the characteristics of the “range of aspiration’’? 
Under this question we may enumerate the following problems: 


a. Are the various levels under investigation here, 1.¢., mini- 
mum, actual and maximum and the various ranges affected by 
factors specific to the task or are they independent of factors 
specific to the task? 

b. Where in the range of aspiration does the “‘actual”’ esti- 
mate lie, 1.¢., is it closer to the maximum expected or the minimum 
expected? 


Quite clearly the first question bears upon the classical problem of 
generality previously treated by Frank (3), Gould (5) and Gard- 
ner (4). 

2. Will information about the performance of a superordinate 
group have the same effect upon each of three levels of aspiration? 
This latter question we regard as an extension of the work of Chap- 
man and Volkman (1), Gould and Lewis (6), and Hilgard, Sait and 
Magaret (10). 


2 


ay 
ae 
4 
| 
‘ 
\ 


354 MALCOLM G. PRESTON AND JAMES A. BAYTON 


\eETHOD 
I. Selection of Tasks 


Secause this study was designed to attack methodological as well as theoretical questions 
it was desirable to select tasks which had been used by other investigators in the field. Tasks 
used hitherto include addition, cancellation, the learning of arbitrary associations between digits 
and symbols, dart-throwing, the giving of synonyms and the maintenance of hand steadiness. 
Not all of these tasks could be adapted to the requirements of the present experiment. A pre- 
arranged sequence of scores was to be reported to the subjects (following the technique developed 
by Gardner (4)), and this meant that tasks could not be used which furnished clues on the basis 
of which subjects could become aware of their progress. This requirement restricted the experi- 
menters to the use of addition, symbols-digits, and cancellation, which lent themselves to the 
purposes of the experiment because they could be scored in terms of time, concerning which the 
Ss knowledge, as is well known, could be expected to be highly inaccurate. 


Il. Construction of Artificial Performance Curve 


Gardner recognized that the level of aspiration expressed by the subject is a function of 
his objective performance and he observed that a methodological difficulty arises wherever the 
members of a group of subjects are performing at different levels on the one hand or are exhibiting 
different forms of the learning curve on the other hand. In order to control the objective situa- 
tion, he employed a prearranged sequence of fictitious scores. Using this technique all of his 
subjects based their levels of aspiration upon the same scale values. The method is particularly 
adaptable to experiments in which time scores are used. The experimenter, by keeping the 
subjects from observing the stop watch, can report scores which are fictitious and which, as a 
matter of fact, the subjects accept. In the present experiment the Ss were informed that the 
scores given them were not raw scores but were a special form of standard score. In passing 
it was remarked to the Ss this procedure was justified on the grounds of “involved statistical 
complications.” In each of the three tasks used in this experiment the arbitrary curve was 
placed upon a numerical level characteristic to it. This was done in order to prevent the ap- 
pearance of scores in the same numerical range in the case of more than one of the tasks. 

The following criteria were used by Gardner in the development of prearranged scores: 
(1) The learning curve should remain at the same level for several trials. Subjects come to the 
tasks with different ideas of what they can do and these trials serve to adjust them to their 
objective performance. (2) One part of the curve should show increase in ability. This is a 
situation which suggests success. (3) One part of the curve should show decrease in ability. 
This is a situation which suggests failure. (4) The curve should have enough irregularity to 
simulate reality, but enough regularity for the subjects to recognize the trends. In addition to 
these the present study met the following criteria: (5) The first trials of the second session were 
below the last trials of the first session. This was done to indicate some loss of ability over time. 
(6) Progress in the second session went beyond the level reached in the first session, indicating 
general improvement over the first session. 


III. Rotation of Tasks 


In order to eliminate any artifact which might have arisen from the order in which the tasks 
were presented ! the tasks were rotated in order of presentation. As there were three tasks to 
be used, these could be presented in 6 ways. Since there were two sessions, there were 30 ways 
in which two orders could be drawn from the 6 orders. This made it necessary to have 30 Ss 
each in the two groups, control and experimental. 


IV. Subjects 


Sixty subjects were used. All subjects were Negro men attending Howard University. 
Fach subject was tested individually, the entire experimental period requiring abour 1} hours, 


— 


1See Frank, J. D., The influence of the level of performance in one task on the level of 
aspiration in another. J. Exp. Psychol. 1935, 18, 159-171. In this paper Frank shows that 
changes in performance level in one task affect the height of the level of aspiration in another. 
Since the extent of this effect is a function of the degree of similarity between the tasks it is 
important if more than one task is used that they be suitably rotated in order to keep the effect 


constant. 


w 
N 
V 
th 
U 

to 

tk 
he 

7 

a 
| 
\ 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 


Wi 


with a 10-minute rest period between sessions. All contact with the subjects was made by the 
Negro experimenter. The Ss were assigned to experimental and control groups randomly. 


\. Experimental Variation 


Between the first and second sessions the members of the experimental group were told that 
their average performance for Session I was the same as that made by white students at the 
University of Pennsylvania who had been tested by a white experimenter. They were then 
told the averages that the Pennsylvania students had allegedly made in the second session. 

Exactly the same information was given to the control group except that they were told 
that the other scores were the averages of scores made by students at three Negro colleges who 
had been tested by the same Negro experimenter. Three colleges were mentioned in order to 
avoid any excessive rivalry which might have been aroused by reference to a single Negro college. 


The scores reported to all subjects are given in Table 1. ‘Trials 
I-10 represent Session I and Trials 11-19, Session II. 


TABLE 1 
PREARRANGED SCORES 
Trials Symbols-Digits Cancellation Addition 
I 101 40 | 95 
2 100 39 g6 
3 101 40 | 95 
4 III 50 105 
5 117 56 lil 
6 114 53 | 108 
7 112 106 
8 113 52 107 
9 120 39 114 
fe) 124 63 | 118 
II 118 | 57 | 112 
12 119 56 111 
13 118 57 112 
14 128 67 122 
15 134 73 128 
16 131 70 25 
17 129 68 123 
18 130 69 124 
19 137 76 | 131 


VI. Camouflage 


There were two respects in which the experiment required the deliberate deception of the 
subjects. These were the use of fictitious performance scores and the use of fictitious comparison 
averages. Great care was taken to keep the subjects in ignorance of the fact that fictitious scores 
were used, and it is believed that in both instances the deception was successful. None of the 
subjects questioned the fictitious comparison scores, but a few did question their own performance 
scores. This occurred when the reported score seemed to them to be inconsistent with their 
feelings about their performance. The experimenter was successful in erasing this doubt by 
calling attention to the fact that they were being graded in terms of fractions of seconds and 
that estimations of short intervals of time, in any event, were highly unreliable. 


VIL. Instructions 


Session I. 

All subjects were told they were to do a number of simple tasks and that they were to be 
scored in terms of the time required to complete them. They were also told that because of 
statistical complications they would’ be given standard scores rather than their actual times. 
They were instructed that the scores made in one task had no relation to those made in another. 


ne 
Roy 
ad 
= 
on 
5 
i 


356 MALCOLM G. PRESTON AND JAMES A. BAYTON 


Next they were informed of the three levels of aspiration they were expected to give. Three 
trials were then given and after each one the arbitrary score was reported to the subject. The 
subjects did not give levels of aspiration until after the third trial. 

After the third trial the following instructions were given: 


“Before you try this again I want you to tell me what you expect to do this time. As | 
said before there are three levels that you are to tell me. First of all 1 want you to tell me the 
highest score you feel you could possibly make. This does not necessarily mean for this next 
trial, but is, for instance, the score you think you could make if you practiced this task as much 
as you wanted to. You might call this level the one you would consider ‘hitting the jack-pot’ 
of your ability in this task. 

“*Next I want you to tell me the lowest score you think you might make. This does not 
mean you will drop this low, but if you should have a bad trial, what would you be willing to 
bet it would not be below. This score would make you disgusted and want to quit, but it excludes 
any low scores you might make because of accidents such as breaking your pencil point. 

‘Finally I want you to tell me as accurately as possible just what you actually feel you 
will do this next time. Use any information about your previous work in this task that will 
assist you in determining this estimate.” 


These three levels were obtained after each of the remaining trials in the session with the 
exception of Trial 10 which ended the first session. The levels expressed after this trial are 
considered as part of Session II since the experimental variation was introduced after the per- 
formance of Trial 10 and before the levels of aspiration were expressed for Trial 11. 

All subjects received the same instructions in Session I. 

Session II 

To avoid monotony in giving levels of aspiration throughout the two sessions, levels were 
obtained on every other trial in Session II. This was true for both experimental and control 
groups. Levels were obtained before Trials 11, 13, 15, 17, and 19. 

The following additional instructions were offered at the outset of Session II: 


Experimental Group 

‘“‘T have been assisting in a study of racial differences and I have given these tests to you 
so that we will have a basis for racial comparisons. 

““You may be interested in knowing that the averages you made in the first part of this 
experiment are practically identical with those made by a group of white students at the Uni- 
versity of Pennsylvania. They were tested by a white experimenter. In adding, your average 
was 106.1 and theirs was 106.0. In symbols-digits your average was 111.7, and theirs was 111.7. 
In cancellation your average was 51.1 and theirs was 51.13. As you can see, you are equal to 
them in performing these tasks. 

‘“*Now we are going to complete your part in the experiment.” (The experimenter handed 
the subject the appropriate task.) 

‘As I said, you both had an average of —-———— (here was given the value appropriate to 
the task which the S was about to undertake). In the second period the Pennsylvania students 
made an average of ——-—— (here was given the actual average of the prearranged scores on 
the task to be undertaken by the S). On your last trial you made ————. What is the 
best you think you will ever make in this test? What is the lowest score you might ever make? 
What do you think you will actually do this time?” 


Control Group 

The control group received the same instructions with the exception that the comparison 
groups were identified as coming from three Negro colleges and that they had been tested by a 
Negro experimenter. 


RESULTS 


I. Reliability—Table 2 gives the coefficients of correlation be- 
tween levels of aspiration and between ranges of aspiration as ex- 
pressed on the same task in Session 1 and Session 2. These correla- 
tions may be interpreted as giving the reliabilities of the measures 


( 
1 
] 
( 
| 
4 
| 
( 
1 
2 ( 
( 
( 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 357 


of the levels and of the aspiration ranges. In the first row of Table 2 
appear coefficients calculated between the Maximum estimates ob- 
tained in the first and second session in the case of the three tasks for 
experimental and control groups. ‘The measures correlated were the 
average Maximum estimates of each subject in Sessions | and II, 
respectively. The experimental group yields an 7 of .85 + .o4 when 
its Maximum estimates on symbols-digits for the first session are 
compared with its Maximum estimates on symbols-digits for the 
second session. In the second row appear values based on Actual 
estimates. Row 3 gives coefficients based on Least estimates. Row 
4 exhibits values based on the range between estimates of Maximum 
and Least. The measures correlated were the average differences 


TABLE 2 


CoEFFICIENTS OF CORRELATION BETWEEN LEVELS OF ASPIRATION AND BETWEEN RANGES OF 
ASPIRATION EXPRESSED ON SAME TASK IN SESSION I AND Session II 


Symbols-Digits Cancellation Addition 
Level 
Exp. Contr. Exp. Contr. Exp. Contr. 
Max 854.04 93.02 864.03 81.04 82+.04 $3.09 
Actual 85+.04 41.10 452.10 324.11 844.04 .61+.08 
Least .95+.01 -78+.05 86+.03 792.05 .50+.09 552.09 
M-L 99.005 .67+.07 .88+.03 742.06 .69+.07 82+.04 
M-A .73+.06 98+ .005 56+.08 81.04 .69+.07 88+ .03 
A-L 80+ .04 83.04 .56+.08 53.09 82+.04 


between the levels in the two sessions. The experimental group 
yields an r of .99 + .005 when the range between the Maximum and 
Least levels on symbols-digits for the first session is compared with 
its range between Maximum and Least estimates on symbols-digits 
for the second session. Row 5 gives values of r based on the range 
between estimates of Maximum and estimates of Actual. Row 6 
gives coefficients based on the range between estimates of Actual 
and estimates of Least. 

From this table it is apparent that for both experimental and 
control groups, for all tasks, for the three levels of aspiration, and for 
the three ranges of aspiration a very considerable degree of reliability 
is present in the data. The lowest coefficient of the series of 36 
coefficients is .32 + .11 observed in the case of the actual estimates in 
cancellation for the control group. The highest coefficient is 
.99 + .005 observed on symbols-digits (M-L), for the experimental 
group. It will be observed that there appears to be no trend which 
distinguishes on the basis of the reliability of the results, the level 
estimates from the ranges based on these estimates. 


~ 
=| 
4 
med 
Ay. 
tly 
+ 
/ 


358 MALCOLM G. PRESTON AND JAMES A. BAYTON 


II. Generality—Table 3 gives the correlations between the tasks 
for the experimental and control groups in Session 1 and 2. Column 1 
lists levels and ranges. Columns 2 and 3 give the coefficients of 
correlation between estimates on symbols-digits and addition for the 
experimental and control groups, respectively. Columns 4 and 5 
give the coefficients from estimates on cancellation and addition for 


TABLE 3 


CoRRELATIONS BETWEEN TASKS FOR EXPERIMENTAL AND ConTROL Groups, 
SESSION I AND SESSION 2 


Session I 
Symbols-Digits Addition Cancellation Addition Symbols-Digits Cancellation 
Level 
Exp. Contr. Exp. Contr. Exp. Contr. 
Max .78+.05 .48+.09 .61+.08 .47+.10 
Actual S81+.04 .47+.10 85+.04 39+.11 -70+ .06 43.10 
Least .69+.07 53.09 .67+.07 .72+.06 .65+.07 
M-I .64+.07 .78+.05 80+.04 44.10 .66+.07 $8.11 
M-A .62+.08 .04 33.11 79.05 81.04 .gO+.02 
A-I .742.06 77.05 .75+.06 324.11 .92+.02 -50+.09 
Session 2 
Max 934.02 82+.04 .95+.01 .78+.05 S85+.04 96+.01 
Actual 80+ .04 .§7+.08 .71+.06 .66+.07 74.06 542.09 
Least .Q2+.02 .95+.01 .93+.02 794.05 .92+.02 S84+.04 
M-L 41.10 844.04 .g6+.01 7924.05 79.05 .73+.06 
M-A 82+.04 .004 .95+.01 .92+.02 834.04 .95+.01 
A-L 97+ O1 .96+.01 .93+.02 .75+.06 .JO+.02 80+.04 
Probabilities that the differences between corresponding r’s in the two 
sessions could have occurred in random sampling 

Max .O1 34 10 .O1 
Actual .46* .30 .0g* .08 39 
Least .O1 .O1 33 41 05 
M-A 05 .03 41 
A-L .O1 .O1 .02 


*r in Session 1 higher than corresponding rf in Session 2. 


both experimental and control groups. Columns 6 and 7 give the r’s 
between estimates on symbols-digits and cancellation for the two 
groups. ‘lhe upper part of the table gives the 7’s for the first session; 
the middle part gives the 7’s for the second session; the third part of 
the table gives the probabilities that differences between correspond- 
ing r’s in the two sessions could have occurred in random sampling. 

In Row 1, Column 2, is the value of .73 + .06 yielded by the 
experimental group in Session 1 when its Maximum estimates for 
symbols-digits are correlated with those for addition. ‘The measures 


| | 
| 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 359 


correlated were the average Maximum estimates. In Row 4, 
Column 2 is a value of .64 + .07 yielded by the experimental group 
in Session I when the range between the Maximum and Least levels 
for symbols-digits is correlated with the range between the Maximum 
and Least levels for addition. In Row 7, Column 2, is a value of 
.93 + .02 yielded by the experimental group in Session 2 when its 
\faximum estimates for symbols-digits are correlated with those for 
addition. In Row 10, Column 2, is a value of .41 + .10 yielded by 
the experimental group in Session 2 when the range between the 
\faximum and Least levels for symbols-digits is correlated with the 
same range for addition. 

The above correlations demonstrate that generality of the level 
of aspiration phenomenon exists over the various tasks under the 
conditions of the present experiment. ‘The lowest of the 72 7r’s is 
31 + .11 observed in Session 1 in the case of the M-L range for the 
control group in the correlation of symbols-digits and cancellation. 
The highest r is .gg + .005 seen in Session 2 in the control group for 
the M-—A range in the correlation of symbols-digits and addition. 
‘These findings support the conclusions of Frank (3) and Gardner (4). 
‘The present experiment also demonstrated that generality also exists 
for the ranges between levels. 

As we have already remarked, previous experimentation on the 
problem of generality of level of aspiration has produced conflicting 
results. Table 3 supports the conclusion that lack of control of the 
estimates from subject to subject and within the same subject is a 
source of experimental unreliability which cannot help but obstruct 
the solution of the generality problem. With the present method of 
control through more specific instructions it appears that level of 
aspiration is independent of the task and makes its presence felt in 
all levels and ranges. 

It must be pointed out that these values could be, to some extent, 
functions of factors inherent in the experimental procedure. First, 
all of the tasks required estimates to be made which were essentially 
based upon intervals of time. Secondly, although an attempt was 
made to have the scores fluctuate within different scale regions these 
regions might not have been widely separated enough to give the 
subjects freedom to estimate without being affected by the scores 
and estimates of the other tasks. In these two respects our experi- 
ment is subject to somewhat the same reservations which Gardner 
placed upon his own results. They cannot, however, explain the 
fact that our coefficients are higher than those reported by him. 

Inspection of Table 3 indicates that the r’s in Session 2 are gen- 
erally higher than those in Session 1. The third part of the table 
studies the extent to which this tendency is statistically reliable. 


= 


as 


wh 
©. 
on 
‘ 
Pee 
a 
ps 


360 MALCOLM G. PRESTON AND JAMES A. BAYTON 


The statistical reliability of each of the 36 differences between corre- 
sponding coefficients was determined by the use of Student’s Z dis- 
tribution (2), a procedure which permits the evaluation of the sig- 
nificance of the difference between coefficients of correlation when 
the number of cases is small or the coefficients are high. There are 
32 instances in Table 3 in which the r observed in Session 2 is larger 
than the corresponding r observed in Session 1. Of the 32 instances 
16 yield a P value of less than .o1, from which we conclude that in 
each of these 16 instances the difference is so large that it would occur 
in random sampling less than once in 100 trials. In addition it can 
be seen from the table that in none of the four instances where the 
direction is contrary to the trend is the difference sufficiently large 
to be regarded as significant. The smallest P value of the four is 
.0g, the next smallest .33. 

The statistical significance of the fact that 32 out of the 36 differ- 
ences occur in the same direction may be evaluated by considering 
the probability that such a preponderance would occur if nothing but 
chance was affecting the direction of the differences. If such were 
the case we would expect approximately 18 of the differences to occur 
in one direction and approximately 18 in the other. The standard 


error of this expectation (14) is given by the formula op = NEE 


where in our case p =q =.5 and N = 36. Application of the 
formula yields op = 3.00. Since the difference between the ob- 
served and expected frequencies is 14 the CR is 4.67, and the likeli- 
hood of the discrepancy occurring in random sampling is sufficiently 
remote for the sampling hypothesis to be rejected. In summary, 
therefore, whether we depend upon the significance of the difference 
between individual r’s or upon the trend apparent in the data as a 
whole it is quite evident that the 7’s in Session 2 are affected by condi- 
tions which result in their being higher in value than those in Session I. 

An interpretation of these latter facts is suggested by Frank’s 
observation (3) that estimates made on the pitching of quoits gave 
the lowest correlations with the estimates of his other tasks. This 
fact he attributed to the effect of the “‘play attitude” of his subjects 
which he observed during the performance of the task. Where 
Frank observed that the coefficients were low when his Ss were 
playing we observe that coefficients are increased when the Ss are 
affected by conditions of rivalry. The conclusion is inescapable that 
the Ss attitude is an important factor in determining the degree of 
generality present in the data. In extension of this argument it may 
be noted that these results are consistent with the hypothesis that 
the appearance of generality among levels of aspiration expressed on 


2 
; 
4 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 361 


different tasks is conditioned by the extent to which the ego is in- 
volved in the task set the subject. 

Ill. Effect of the Social Variable.—Table 4 gives the percentage of 
the experimental group reaching or exceeding the twenty-fifth per- 
centile, the median, and the seventy-fifth percentile of the control 
group for all tasks, for estimates Maximum, Actual, and Least, and 


TABLE 4 


PERCENTAGE OF EXPERIMENTAL Group REACHING OR EXCEEDING THE 25-50-75 PERCENTILES 
OF THE ContRoL Group (LEVELS) 


Session I 

Maximum | | Actual 1 Least 
Task Trials 
25 50 75 || 25 | so | 7s || 25 | so | 75 
a 4,5,6 | 73.5 | so.0 | 35.8 || 65.3 42.9 | 15.0 i 66.7 | 42.0 | 18.5 
Symbols-Digits | 7, 8 71.7 | 65.6 | 43.5 || 70.0 | 43.8 | 27.1 | 70.8 | 44.6 | 22.3 
9, 10 72.5 | 64.3 | 42.4 || 62.2 | 44.6 | 28.6 |) 72-5 | 54:9 | 33-5 

| 
4,5,6 | 74.2 | 61.6 | 36.2 || 76.0 | 54.0 2.3 || 65.0 | 46.7 | 31.7 
Cancellation 7,8 75.4 | 56.7 | 37.8 || 77.3 | 62.0 | 19.7 || 71.3 | 50.0 | 31.0 
9, 10 76.6 | 59.3 | 37-3 || 67.2 | 58.9 | 38.6 || 71.7 | 49.7 | 28.3 

| 
= 4,5,6 | 59.6 | 50.0 | 37.0 || 74.2 | 43.3 | 23.9 || 69.2 | 34.8 | 9.5 
Addition 7,8 56.3 | 50.3 | 36.7 || 59.6 | 44.2 | 23.3 || 61.7 | 45.0 | 12.2 
9, 10 59-5 | 48.9 | 32.9 || 60.7 | 45.7 | 28.3 || 68.3 | 41.3 | 16.3 

| 
Session 2 
Maximum Actual Least 
Task Trials | 

25 50 75 25 50 75 || 25 50 75 


I+2 | 58.7 | 41.3 | 25.7 || 61.7 | 29.6 | 18.3 || 61.3 | 31.4 | 17.0 
63.8 | 49.1 | 27.9 || 74.2 | 64.1 | 36.8 || 64.2 | 40.0 | 27.5 


Symbols-Digits 3 
4+5 65.3 | 49.6 | 33.8 || 71.4 | 55.8 | 29.0 || 76.0 | 36.8 | 15.9 


I+2 | 66.6 | 38.7 | 15.9 || 48.6 | 41.3 | 21.0 || 62.0 | 14.6 6.9 


Cancellation 3 75.0 | 50.0 | 24.2 || 79.5 | 71.6 | 36.6 || 65.8 | 50.0 | 13.9 
4+5 | 71.3 | 51-3 | 31-4 || 72.7 | 59.7 | 30.0 || 73.6 | 50.6 | 32.3 
I+2 | 68.3 | 53.3 | 32.3 || 80.0 | 46.8 | 17.6 || 42.5 | 23.3 9.2 
Addition 82.6 | 68.2 | 44.7 || 68.4 | 60.0 | 49.2 || 55.6 | 40.0 | 12.0 


3 
4t+5 76.0 | 60.3 | 37.6 || 60.3 | 50.7 | 40.7 || 68.3 | 40.0 


for both sessions. Data from Session I is in the upper half of the 
table. In Session I three percentages are given for each task at each 
percentile. The first of the percentages was obtained by comparing 
the average estimates made on trials 4, 5 and 6 by the experimental 
group with the average estimates made on the same trials by the 
control group. (No estimates were made during Session 1 until trial 
3 had been completed.) The second value reported is based on the 
averages of the estimates made on trials 7 and 8. The third value 


“tre 
4 
= 
| 21.8 


362 MALCOLM G. PRESTON AND JAMES A. BAYTON 


reported is based on the averages of the estimates made on trials 9 
and 10. The value 73.5 is to be interpreted as meaning that 73.>; 
percent of the experimental group exhibit a mean of the Maximum 
estimates on trials 4, 5 and 6 which reaches or exceeds the mean esti- 
mate on trials 4, § and 6 found at the 25th percentile of the control 
group. 

In the second half of the table appear data from Session II. In 
this case the first value reported is based on the mean of the first two 
estimates made; the second value is the third estimate made and the 
last value is based on the mean of the fourth and fifth estimates 
made. ‘The value 58.7 is to be interpreted as meaning that 58.7 
percent of the experimental group reach or exceed the 25th percentile 
of the control group, if the two groups are compared in respect to the 
average of the Maximum estimates made on trials 1 and 2 of Ses- 
sion If. 

Table 5 duplicates Table 4 in design. In Table 5 however appear 
the probabilities that the various differences between observed and 
expected overlapping occur in random sampling.? It will be ob- 
served from Table 4, for example, that in Session II only 6.9 percent 
of the estimates made by the experimental group reached or exceeded 
the 75th percentile of the control group in the case of the Least 
estimates made on the first and second trials of cancellation. If no 
difference existed between the two distributions it would be expected 
that 25 percent of the one distribution would reach or exceed the 
75 percentile of the other. The likelihood of a discrepancy of 18.1 
percent occurring in random sampling is remote since the probability 
is 

Examination of Table 5 discloses the following facts: 


1. Of the 27 P values observed in Session 1 in the case of symbols- 
digits only 10 reflect a trend for the estimates of the experimental! 
group to be equal to or greater than the estimates of the control group. 
Of the 27 P values, however, only 2 are less than .o5 and only 4 are 
less than .10. Of the 27 P values observed in the case of cancella- 
tion, 19 reflect a trend for the estimates of the experimental group 
to be equal to or greater than the estimates of the control group. Of 
the 27, none is less than .10. Of the 27 values observed in the case of 
addition only 6 reflect a trend for the estimates of the experimental 

2 These probabilities were calculated using the formula given by T. L. Kelley, Statistical 
Method, New York, MacMillan, 1923, p. 318, for the standard error of the percentage of one 
distribution which reaches or exceeds a given percentile of another distribution. If no difference 
exists in overlap between the two distributions the percentage in the one reaching or exceeding 
the chosen percentile in the other will be equal to the complement of the chosen percentile to 1. 


Any observed difference between the complement to 1 and the observed percentage may be 
treated with the standard error given by Kelley as a C.R., and the reliability determined in the 


customary manner. 


| 
x 
: 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 363 


TABLE 5 


PROBABILITIES THAT THE DIFFERENCES BETWEEN OBSERVED AND EXPECTED OVERLAPPING 
Wovutp Occur 1n Ranpom (LEVELS) 


Session I 
Maximum Actual | Least 
Task Trials = 
25 50 75 25 50 75 ] 25 50 | 75 
4,5,6 | .42 | .sot | .21t || .20 | .27 | || | | 
Symbols-Digits | 7, 8 35 .o4t | 31 .42} 33 33 2 
9,10 | .38 | .o7t | || | | |) | | 
4,5,6 | .46 | .12t | .15t || .46* | .36f | .24t || .20 | .36 | .2z7t 
Cancellation 7,8 .48* | | || .42* | | .29 .36 sot | .35% 
9, 10 .42* | .21f | || .20 | .13f || .42 49 .40f 


4,5,6 | .06 sot | | .46 .30 .46 .08 10 
Additions 7,8 .03 .48f | .17f || .07 .29 42 i 20 31 .07 


10 | .04 46 | .23f || .08 38t || .31 | 18 | 


Session 2 


Maximum Actual Least 
Task Trials 
25 50 75 25 50 75 25 50 75 i 
Il 16 .48f 35 08 .02 
Svmbols-Digits 3 .12 .46 .40f |! .46 .ogt 15 TS 
45 | | 48 | 23h |) | .35t | || | | 16 
2 25 16 12 .18 36 12 
Cancellation 3 .50* | .sof | .46 ont | | .o8 


4,5 | -33 | -44f | || -40 | .35f | || -44 | | .26f 


.29 | .26f || .29* | .43 .23 .O1 
Additions 3 .16* | .o2t | .26f .23 | .o1f .O4 17 06 


4,5 | 08 | .04f || .33 21 36 


* More than 75 percent of Experimental Group Equal to or Greater than 25 tile of Control 
Group. 

+ More than 50 percent of Experimental Group Equal to or Greater than median of Control 
Group. 


t More than 25 percent of Experimental Group Equal to or Greater than 75 tile of Control 
Group. | 


group to equal or exceed the estimates of the control group. Of the 
27, 2 are less than .o5 and 8 are less than .10. It would appear from 
this analysis that the differences existing between the experimental 
and control group in Session I arise in random sampling. It might 
be thought that the fact that as few as 6 P values out of 27 lie in the 
same direction was significant of a trend for the groups to differ. 
This conclusion is weakened by the fact that any three P values 
taken from the same set of data cannot be regarded as being mathe- 
matically independent of each other. 


Ras 
ae 
= 
ott 
| 
a 
i 
UF 
“Dh 


364 MALCOLM G. PRESTON AND JAMES A. BAYTON 


2. With respect to the Maximum estimates made in Session I] 
it is apparent that no significant tendency exists for the two groups, 
experimental and control, to differ. 

3. With respect to the Actual estimates a slight tendency for the 
experimental estimates to exceed the control estimates is observed 
although little statistical reliability attaches to the observation. 
‘The existence of the tendency may be seen in the fact that for all 
three tasks the first trials of the experimental group tend to be below 
the control group in respect of the estimate made, while in the re- 
maining trials the experimental group tends to exceed the control 
group in size of estimate given. 

4. The effect of the experimental variation is seen clearly in the P 
values covering Least estimates. Of the 27 P values 6 are less than 
.02, and 10 are less than .10. The larger P values of the 27 occur in 
the instance of the later trials of the Session. No P value taken on 
the last trials is less than .10. The effect appears most clearly in the 
case of cancellation and additions. 

Table 6 duplicates the design of Table 4; the data in Table 6 how- 
ever pertain to the ranges of aspiration rather than the levels of 
aspiration. ‘The upper half of Table 6 gives data taken during 
Session I, the lower half gives data taken during Session II. The 
numbers appearing in the various columns are percentages of the 
experimental group who reach or exceed given percentiles of the 
control group. For example, the value 84.1 in column 1, row 1 of 
the table is to be interpreted as meaning that 84.1 percent of the 
experimental group reach or exceed the 25th percentile of the control 
group in Session I when the average ranges between the Maximum 
and Least estimates observed on trials 4, 5 and 6 of symbols-digits 
of the experimental group are compared with the same values taken 
from the control group. 

Table 7 is similar in design and content to Table 5, the entries, 
however, consisting in the probabilities that discrepancies in percent 
overlapping observed in Table 6 occur in random sampling. From 
Table 6 and 7 the following facts may be observed: 


1. Inspection of Tables 6 and 7 suggests strongly the fact that the 
differences observed between experimental and control groups in 
Session I are negligible. Of the 81 P values in Table 7 pertaining 
to Session I only 8 are equal to or less than .1o. 

2. On the other hand the data from Session II indicate a strong 
tendency for the ranges to increase under the pressure of the social 
variation on trials 1 and 2 in the case of addition. This can be seen 
by examining the percent of the experimental group which reaches or 
exceeds the three percentiles for all three ranges in trials 1 and 2. 


: ‘ 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 365 


TABLE 6 


PERCENTAGE OF EXPERIMENTAL Group REACHING OR EXCEEDING THE 25-50-75 PERCENTILE 
OF THE ConTROL Group (RANGES) 


2 75.6 | 60.0 | 38.7 || 79.4 | 61.0 | 20.0 | 
Cancellation 3 74.1 | 46.6 | 26.6 || 77.8 | 54.9 | 43.0 


4,5 68.0 | 37.5 | 25.0 || 81.3 | 50.0 | 19.0 


Session I 
(M-L) | (M-A) | (A-L) 
25 50 7s || 25 | s0 | 75 25 | so | 75 
a 4,5,6 | 84.1 | §5.8 | 33.3 | 83.3 | 63.3 | 37-1 | 66.9 | 46.6 | 40.8 
Symbols-Digits | 7,8 81. 66.6 | 32.5 | 79.4 | 60.0 | 51.6 || 74.1 | §2.6 | 25.5 
g, 10 77.1 | 53-3 | 28.7 | 93.2 | 67.2 | 36.9 || 63.6 | 50.0 | 23.3 
4,5,6 | 75.0 | 60.9 | 40.6 1 75.6 | 57.8 | 35.8 | 69.7 | 60.0 | 25.5 
Cancellation 7,8 74.1 | 60.8 | 38.3 1 85.0 | 50.0 | 35.0 || 70.0 | 56.6 | 18.3 
9, 10 75.0 | 60.0 | 30.0 || 82.5 | 52.0 32.2 66.2 | 60.0 | 40.0 
a 45,6 | 80.0 | 56.7 | 30.0 | 78.3 | 57.8 | 37.7 | 74-4 | 43-3 | 36.6 
Addition 7,8 62.9 | 45.7 | 32.2 || 75.0 | 64.0 | 23.6 || 57.7 | 45.0 | 33.7 
9, 10 72.1 | 46.6 | 38.3 || 83.5 | 56.6 | 30.0 || 68.3 | 49.0 | 40.0 
i 
Session 2 
(M-L) (M-A) (A-L) 
Task Trials | | 
2s | so | 7s || 25 | so | 75 || 25 | so | 75 
1] 
I, 2 78.6 | 54.3 | 23.3 | 80.9 65.4 | 38.3 | 62.4 | 48.5 | 20.0 
Symbols-Digits 3 74.5 | 56.6 | 25:5 || 75.0 | 50.0 | 19.6 || 65.9 | 50.0 | 32.8 
4,5 80.0 | 51.0 | 26.6 || 87.5 | 67.0 | 34.6 | 74.6 | 50.0 | 28.3 
| 
| 


- 2 90.3 | 85.0 | 40.0 } 94.4 | 66.6 | 32.7 | 87.0 | 70.7 | 48.3 
Addition 3 79.1 | 66.6 | 47.6 || 82.c | 64.4 | 43.1 || 73.3 | 56.6 | 44.1 
4,5 93-6 | 56.6 | 35.0 | 81.5 | 64.0 | 37-5 || 84.1 | 55-7 | 40.0 


On the other hand there is no evidence that the extent of the range 
has been affected in the instance of estimates made on cancellation 
and on symbols-digits. Again, the estimate on trials 3, 4 and 5 of 
addition give no sign of being affected by the experimental variation. 


IV. Ranges of Aspirationn—While tabular evidence is not pre- 
sented to illustrate the fact, a striking characteristic observed in the 
original data of the experiment is the fact that the range M-A is 
invariably smaller than the range A-L, i.¢., Ss in this kind of experi- 
ment place their actual estimate closer to their Maximum than to 
their Least estimate. In illustration of this fact, the median range 
M-A, Session I, addition, trials 4, 5 and 6 is 13.37; the median 
range A-L in the same session, same task and same trials is 29.99. 
The median range M-—A, Session I, cancellation, trials g and 10 is 


ert 
‘ 
+ 
~ 
by 
re 
+s 
ot 
7 
‘ 


366 MALCOLM G. PRESTON AND JAMES A. BAYTON 


TABLE 7 


PRoBABILITIES THAT THE DIFFERENCES BETWEEN OBSERVED AND EXPECTED OVERLAPPING 
Wovutp Occur IN RANpom SampLinG (RANGES) 


Session I 
(M-L) (M-A) (A-L) 
Task Trials | 
25 so | 75 || 25 50 75 25 50 75 
4,5,6 | .12* | | .18f || .2o* | | || .23 .46 .o7t 
Symbols-Digits | 7, 8 .26* | .o4f | .23f || .30*% | .17t | .o1t || .46 427 | 
g, 10 .42* | .42T | .35f || .o1* | | .15 12 50T | .42 


4,5,6 | .50*% | .14f | .06f || .48* | .24f | .21f || .29 17T | .48t 
Cancellation 7,8 31 S77 | | .Sot | .16f |] .27 | .38 
g, 10 50* | .20f | .31f || .20*% | .46f | .23f || .18 | .27t 


- 4,5,6 | .33* | .31f | .36f || .36* | .27t | .10f || .48 31 .23f 
Addition 7,8 .13 33 || | | .44 12 33 
g, 10 .40 36 || .17* | .27T | || .27 .46 .20f 


(M-L) (M-—A) (A-L) 
Task Trials 


25 590 75 25 50 75 25 50 75 


I, 2 .38* | .40of | .42 .26* | .o7t | .21f || .12 .46 31 
Symbols-Digits 3 .50 | .48f || .50* | .soft | .33 18 sot | .21t 
2 .33* | .48T | |] .04* | .o1rf | || .48 sot | .36f 


.48* | .20of | .o7f |] .29* | | .26 .09 .42 o6t 


Cancellation 3 .46 .46f || .40* | | .15 38 | 
4, 5 .24 .26 50f .20* | .sof | .21 38 35 
1,23 .o1* | | |} .o1* | | .21f || .04* | .o1T | .o6f 

Addition 3 31* | | .02 15* | .o6f | |] .44 277 | .o4t 


4,5 | .or* | .33¢ | .23t || .20* | .12t | .20f || .31* | .36f | .oot 


* More than 75 percent of Experimental Group Equal to or Greater than 25 tile of Control 
Group. 

t More than 50 percent of Experimental Group Equal to or Greater than median of Control 
Group. 

t More than 25 percent of Experimental Group Equal to or Greater than 75 tile of Control 
Group. 


g.11; the comparable value in A-L is 15.00. The results are re- 
markably consistent in this respect both in Session I and Session II. 

A second fact which will not be documented with tabular evidence 
but which is a striking characteristic of the data is the tendency of the 
Ss to give their estimates in round numbers. 


DiIscuss10oN 


The results of the present experiment require discussion from two 
standpoints. ‘They bear upon the methods which are available for 
the study of aspiration as well as upon the theory of the phenomenon. 
The present results differ from those obtained by other investigators 


| | | 
Session 2 
|| 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 367 


in that the reliability of the estimates is very high and the degree of 
generality observed is very extensive. In this latter connection 
the evidence is strongly in support of the view defended by Gardner 
(4) and Frank (3). Both of these results are undoubtedly related 
to two important elements of the method used in the present experi- 
ment, namely the close definition of the level and the introduction of 
rivalry. That these two elements increased the precision of the 
estimates is clearly indicated by the study of the reliability coefficients 
and the coefficients of correlation calculated between tasks in the 
second session. 

That the various points on the aspiration range should not re- 
spond alike to the same experimental variation is suggested by a 
consideration of Frank’s (3) theory of the needs affecting the level 
of aspiration. In the argument flowing from his experiment Frank 
enumerated three needs animating subjects in a level of aspiration 
experiment. ‘These needs were: (a) the need to make the level of 
aspiration approximate the level of future performance as closely 
as possible, (b) the need to keep the level of aspiration high regardless 
of the level of performance, and (c) the need to avoid failure. 

Among those who have criticized the theoretical structure of which 
these needs are a part is Gould (5), who found them little more than 
ad hoc deductions. Gould’s criticisms depend largely on the fact 
that Frank depended upon the low difference scores to support his 
‘need to avoid failure’ and upon the high difference scores for his 
‘need to keep the level high.”* We are sympathetic with Gould’s 
criticism of the support given Frank’s theory by his data. On the 
other hand we believe that the failure of Frank’s data to show 
conclusively the operation of three needs is due not to the assumption 
of the needs as such, but rather to his conception of the level of 
aspiration. Conceiving the level as a point rather than a range, 
Frank was unable to produce three different effects in the same person 
by the same stimulus, an important desideratum if at the same time 
an S is to protect himself against failure and place his expectation 
high and approximate the level of future performance as closely as 
possible. 

If the need to avoid failure operates (to select one of these needs), 
one would expect it to affect not points in the upper segment of the 
aspiration range, but rather points in the lower segment of the range. 
A golfer in the g0’s who will not give up golf even if he comes in with 
a score of 150, we maintain, differs from the golfer in the 9go0’s who 
expects to break his clubs if he comes in with 115 in respect of his 
need to avoid failure. The more apparent the need to avoid failure, 


>It will be remembered that Frank studied the difference between performance and estimate, 
the observed quantities being known as difference scores. In this respect his method differs 
essentially from that developed by Gardner (4) and used here. 


on 
aan 
‘<4 
z 
>. 
| 


368 MALCOLM G. PRESTON AND JAMES A. BAYTON 


the less satisfactory, other things being equal, should be the minimally 
acceptable performance. Again, if the need to keep the expectations 
high operates, one would expect it to affect that expectation defined 
by the instruction “what is the best you will do granted everything 
is favorable to your performance?” And finally what other point 
on the curve should be affected by the need to approximate the level 
of future performance as closely as possible but that point corre- 
sponding to the Ss’ ‘actual’ expectations? 

It may reasonably be asked “‘ what are the grounds for concluding 
that these particular points on the aspiration curve are affected by 
the needs identified with them in the foregoing discussion?,” par- 
ticularly since Gould criticized Frank for the ad hoc identification of 
low difference scores with the need to avoid failure and high differ- 
ence scores with the need to keep the level high. That our argument 
differs essentially from Frank’s is apparent. Frank’s argument 
depended upon the interpretation of difference scores all taken on the 
basis of the same instructions, 2.¢., the instructions requiring the S 
to report the score he expects to make. On the other hand our argu- 
ment differentiates expectations and the various difference scores 
depend upon these differentiated expectations. It is in the success 
with which these differentiated expectations are defined for the 
subjects that the success in tapping the three needs lies. By defi- 
nition, we maintain that instructions to report that performance 
which would cause one to admit failure, yields an estimate which is 
a function of the need to avoid failure. And similarly must the 
argument be developed for the other needs. 

While we have used Frank’s theory as a point of departure for a 
consideration of the theory of level of aspiration we do not wish to 
maintain that the experimental results are crucial for his variety of a 
multiple function theory. Rather we maintain that the concept of 
level of aspiration as commonly entertained must be expanded to 
include a variety of tendencies. In the way of identifying these 
tendencies we would identify two on the basis of our experimental 
results. In the first place we regard these results as consistent with 
the hypothesis that the Ss conception of his future prospects is 
conditioned by a self-imposed requirement that he protect himself 
against feelings of failure. In this sense the level of aspiration is 
affected by considerations of self-preservation. In the second place 
we regard it likely that expectations of future performance will be 
affected by a desire to do well. In this sense the level of aspiration 
is a form of self-motivation. Supporting this differentiation of 
function are the results of the experiment indicating that the Least 
estimate is affected in a way quite different from the effect observed 
in the Maximum and Actual estimates. 


> 
4 
N 
e 
- 
/ 


DIFFERENTIAL EFFECT OF A SOCIAL VARIABLE 369 


The results of this experiment and the differentiation of the func- 
tion of the various levels of aspiration which the results support 
throw light upon the psychology of persons who are constantly sub- 
ject to the suggestion that they represent an inferior group, of which 
Negroes are an excellent example. The results and the theory 
dependent upon them suggest that in competing with whites the goal 
of the Negro is no different from that goal which animates him when 
in competition with members of his own group. However it is quite 
apparent that the Negro college men constituting the experimental 
group were affected by their knowledge of the performance of the 
whites in a manner such as to undermine their confidence in their 
ability to attain or maintain the standards which operated as their 
goal. The lowering of the Least estimates is to be regarded as a 
function of their definition of failure. It is a very reasonable hy- 
pothesis that their definition of failure is a product of the constant 
suggestion concerning their status to which they are subject. Itisa 
very interesting fact that under the conditions of this experiment the 
experience of success in competition with the white group has as its 
consequence the disappearance of the discrepancy between the Least 
estimates of the experimental and control groups. This fact would 
seem to support the view that experiences of success have an im- 
portant influence upon the adjustment of those persons who are 
subject to feelings of inferiority arising from social pressures. 


(Manuscript received April 24, 1941) 
REFERENCES 


1. CHapman, D. W., & Votxmann, J. A. Social determinant of the level of aspiration, 
J. abnorm. (soc.) Psychol., 1939, 34, 225-238. 

2. Fisner, R. A., Statistical Methods for Research Workers, London, Oliver & Boyd, 1934. 

3. Frank, J. D., Individual differences in certain aspects of the level of aspiration, mer. J. 
Psychol., 1935, 47, 119-128. 

4. Garpner, J. W., Level of aspiration in response to a prearranged sequence of scores, /. exp. 
Psychol., 1939, 25, 601-621. 

5. Goutp, R., An experimental analysis of ‘level of aspiration,’ Genet. Psychol. Monogr., 1939, 
21, 3-115. 

6. Goutp, R., & Lewis, H. B., An experimental investigation of changes in the meaning of 
level of aspiration, J. exp. Psychol., 1940, 31, 422-438. 

7. Goutp, R., & Kapuan, N., Relation of ‘level of aspiration’ to academic and personality 
factors, J. soc. Psychol., 1940, 11, 31-40. 

8. GurttForp, J. P., Psychometric Methods, New York, McGraw Hill, 1936. 

9. Hertzman, M., & Festincer, L., Shifts in explicit goals in a level of aspiration experiment, 
J. exp. Psychol., 1940, 27, 439-452. 

to. Hitcarp, E. R., Sart, E. M., & Macaret, G. A., Level of aspiration as affected by relative 
standing in an experimental social group, J. exp. Psychol., 1940, 31, 411-421. 

11. Irwin, F. W., & Gotpicnu, M., Unpublished research on the effect of instruction upon 
level of aspiration, conducted at University of Pennsylvania. 

12. KNEELAND, N., Self estimates of improvement in repeated tasks, rch. Psychol., N. Y., 
1934, 163, 75. 

13. McGenee, W., Judgment and level of aspiration, J. gen. Psychol., 1940, 22, 3-15. 

14. Yue, G. U., An Introduction to the Theory of Statistics, London, Charles Griffin, 1911. 


“4 
Fa 
ge 
pee 
iy 
‘ 
poe 
+ 
i’ 
“4 

at 


BODILY MOVEMENT AS RELATED 
TO PROBLEM SOLVING 


BY ALAN D. GRINSTED 


Louisiana State University 


It is often a matter of common observation that an individual in 
the process of solving a mental problem will strike various poses from 
time to time as he shifts from one to another mode of attack upon the 
problem. It has sometimes been suggested that these shifts in 
bodily posture are really shifts in postural set and appropriate to the 
new mode of attack about to be undertaken. Very little seems to 
have been done by way of investigation of this phenomenon. ‘There 
have been, to be sure, many studies on the question of how we think 
through a problem, and the literature on motility is constantly 
growing; but seldom has anyone studied the relation of the two. In 
1908, Storring (16) reported in his study of thought processes that 
eye-movements sometimes occur in connection with visual imagery 
and in the perceiving of position of the various terms. He also 
reported that arm movements seem to represent relationships of 
right and left. Other studies in the early part of the century, such 
as those of Bonser (1), Courten (4), Curtis (5), Gamble (6), and 
Pintner (12), and others somewhat more recently (2, 13, 15) have 
been concerned with the relation of mental activity to circulation of 
the blood, to tongue movements, to movements of the larynx, to 
breathing, etc. General body activity as related to mental activity 
has, however, received little attention. Perhaps the most important 
work reported in this connection thus far is that of Ruth Clark (3), 
who, in connection with a study of imagery in silent thinking, made 
some observations as to the nature and extent of movement of 
subjects who were trying to answer such questions as: “‘ How would 
you plan a trip to Iceland?”’ and “‘How would you entertain a party 
of blind children?”’ Miss Clark reports, on the basis of her observa- 
tions, that “. . . the gross bodily behavior of each individual repre- 
sented a characteristic mode of adaptation which he adopted toward 
all his problems. . . . This gross bodily behavior was so invariably 
adopted by each individual that it seemed to have no bearing on the 
solving of the various problems but was simply a bodily adjustment 
for the better concentration of attention” (3, p. 28). Obviously 
this is counter to the idea that the subject is assuming a postural set. 

Clark thinks that minor movements are more closely related to 
problem solving, and she finds some relation for some individuals be- 


370 


\ 
- 


BODILY MOVEMENT AS RELATED TO PROBLEM SOLVING 371 


tween imagery and eye-movements, articulatory movements, and 
incipient hand movements. In addition to her written observations, 
this investigator used a concealed telegraph key to record on a 
kymograph when the problem was presented; then she pressed the 
key every time she saw the subject move, and the subject pressed 
another key to show when the problem was solved. ‘Thus she at- 
tempted to obtain a record of the amount of movement of any kind. 
Whereas she says on one occasion that ““movement increased as 
thinking progressed”’ (3, p. 29), she later says, “‘the number of 
movements tended to decrease as the thinking was prolonged” 
(3, p. 46). The seeming contradiction of these two statements is 
not clarified. Clark also reports that “‘perceptible movements ac- 
company thinking during not more than one-third of its total time. 
They come and go at irregular intervals with the exception of pauses 
immediately after the presentation of the problem and just prior to 
the subject’s final signal”’ (3, p. 55). She states elsewhere (3, p. 29) 
that the pauses after the problem averaged 2.7 sec. and those just 
before the solution; 3.2 sec. The pauses occurring immediately 
after presentation of the problem again suggest that the person is 
not ‘getting set’ to think out a solution. 

This study by Clark throws doubt upon the validity of the idea that 
the thinker assumes a posture appropriate to his problem. While 
she made no distinction between gross bodily movement and minor 
movements, such as those of the eyes, her results suggest that a 
person’s motility while problem solving is much more an expression 
of the person himself than of the problem which he is attempting to 
solve. Is this gross bodily behavior ‘simply a bodily adjustment for 
the better concentration of attention?’ This is the question upon 
which it was hoped that the present study might throw some light. 


THE PROBLEM 


In accordance with the preceding discussion it is the purpose of 
this experiment to study the relation of motility to problem solving. 
Presumably, if a person assumes postural sets appropriate to his 
thought processes, we may expect his ‘getting set’ movements to 
occur in an objective record of his motility. When one attack 
upon a problem does not bring about a solution and a new method 
of attack becomes necessary, we should expect that the subject will 
show a shift of position appropriate to his new postural set. 

The problem of this experiment, then, may be stated as follows: 
Do a person’s movements during a problem solving activity reveal 
any evidence of postural sets appropriate to the activity of searching 
for a solution? 


i 


wi 
rer 
cae 
a: 
Be 
= 
2 


372 ALAN D. GRINSTED 


A pparatus.—In this experiment there was used a stabilometer chair of a type somewhat differ- 
ent from those mentioned in the literature by Szymansky (10, p. 19) and Renshaw (14). It js 
more complicated than those generally used, but has the advantage of being so constructed that 
the person sitting in it is unaware of the fact that measurements of his movements are being made. 
It is, to all appearances, simply a comfortable Morris chair with a foot rest.!. However, the arms, 
back, seat, and foot rest are all mounted on coil springs, and from each of them there leads off a 
string to a polygraph that is mounted behind the chair. The polygraph has been sufficiently 
inclosed to render it almost noiseless even to one who listens for its sound, and no subject has given 
any sign of being aware of its existence. 

In the latter part of the experiment a pencil maze was used in conjunction with an original 
device to make the situation such that the subject is unable to see where he is going. This 
apparatus consists merely of a light piece of veneer wood 8} inches square and jg inch thick, in 
the center of which is a circular hole ¢ inch in diameter. By means of a beveled block of wood 
4 inch thick adjoining the hole, it is possible to staple a pencil in such a position that its point 
protrudes through the hole until it is in line with the center of the opening. By this means a 
person can rum a pencil maze by watching his progress through the hole and still not be able to 
see ahead far enough to avoid blind alleys or to see openings before he gets to them. It is felt 
that a subject working a maze with this device is more nearly in the situation that an animal 
finds himself in when placed in a ‘wall’ maze than is generally true in the case of pencil mazes. 

Subjects —All of the subjects used in this experiment were members of the psychology classes 
in Louisiana State University. There were fifty-three in all, two of whom were graduate students, 
the rest all being undergraduates. In the procedure to be described, all of the subjects were used, 
except as otherwise stated. So far as the experimenter could know, the subjects were all in good 
health, and each appeared for the experiment alone in accordance with a previously arranged 
appointment. If any selection occurred, it was merely as a result of the fact that some did not 
keep their appointments or avoided making one. These few would not have influenced the 
results greatly had they been included. Of those asked to perform as subjects, about 85 percent 
made appointments and appeared. There is little reason to believe that any were under any 
unusual emotional strain, but it was the first time that most of them had ever acted as subjects in 
a psychological experiment. 

Procedure.-—E.ach S was experimented on alone. When he appeared he was seated in the 
stabilimeter chair facing £, who told him that the purpose of the experiment was to see how he 
would go about solving some problems, and that consequently he was to ‘think aloud’ insofar 
as possible. 

The problems were given by EF, one at a time. Those used were of the kind that involve 
figuring out how to measure out two pints of water, using only a 7-pint pail and a 5-pint pail. 
All but one of the problems were taken from the average adult level of the latest revision of the 
Stanford-Binet Scale (17).2. E read each problem as instructed by the authors of the scale, with 
the exception that he always omitted telling S with which pail to begin until and in the event that 
it became evident that S could not solve the problem without that help. If none of the standard 
problems proved sufficiently difficult to baffle the subject for at least a full minute, an original 
and more difficult problem was added.® 

E also, by use of a concealed telegraph key, made code signal marks that were electrically 
recorded on the polygraph paper to show the times at which (1) the problem was given, (2) the 
subject showed verbally that he knew the answer, and (3) the subject gave verbal evidence that 
he was abandoning one mode of attack to undertake another, the criteria for this last being such 
remarks on the part of S as: ‘‘No, that won’t work,” or “I don’t think I can get it.” The signal 
for (1) was given at a point about midway in the reading of the problem on the assumption that S 
would be starting to search for the solution as soon as he had heard all the necessary details the 
first time. The signals for (2) and (3) were given immediately after E recognized the criteria 
calling for them [i.c. evidence of a shift in attack for (3) and evidence of a correct solution for 
(2)]. The signal for (2) was also given when S gave definite signs of giving up and £ felt that 


1A description of the apparatus will appear in the Am. J. Psychol. (October) 1941. 

2 The usual order of giving the problems was to give all of those in Form M, average adult 2, 
p. 177; and then one or two of those in Form L, average adult 6, the choice depending upon the 
experimenter’s judgment as to which would best call forth the subject’s thought processes and 
yet not be too discouraging. 

3 To get 3 pints of water, measuring with only a 5-pint pail and a 9-pint pail. 


BODILY MOVEMENT AS RELATED TO PROBLEM SOLVING 373 


further encouragement was useless. Encouragement was given S to go on, however, when S 
tended to give up too easily, and in such cases the signal for (3) was given. Aside from brief 
words of encouragement from £ in such cases, however, he avoided any conversation with S 
while S was trying to solve the problem. The problems were given in succession until at least 
four had been attempted and until at least one had consumed a full minute. In three cases S 
solved all of the problems given, including the added, more difficult one in less than a minute each. 
In the case of 45 subjects, the last problem was announced as the last at least twice, with such 
remarks as: “‘We'll try one more,” followed by, “This is the last problem.” 

In the case of 38 subjects, a maze-solving technique was tried after the problem solving part 
of the experiment was concluded; however, the subjects were given no advance notice that any- 
thing else was to be required of them until the problem solving was ended. Each S was asked 
whether he knew what a maze was, after which a simple pencil maze was shown him with an 
explanation of its use. Then there was placed across the arms of the stabilimeter chair a wide 
board on which had been fastened with thumb tacks a Porteus pencil maze for the twelve-year 
level. The maze running device described under ‘ Apparatus’ was already in place with the pencil 
on the starting point, and S was asked to trace his way out without crossing any lines and without 
removing his pencil from the paper. E stood where he could see the progress of S in his effort 
to work his way through the maze and where he could also reach the concealed telegraph key to 
record by the code signals as used in the problem solving part of the experiment the moment when 
S started to search for his way out of the maze and the moment when he finally reached the out- 
side. When the experiment was finally concluded, E asked each S not to discuss the experiment 
with anyone who had not already acted as a subject. 

In the problem solving part of the experiment, fifty-three records were obtained, covering 
201 readable problem records in all. (Some individual problems had to be left out of considera- 
tion because of one or another difficulty, such as failure of one of the pens to record on the entire 
record, failure of the experimenter to use the correct code signals, etc.) The entire record of each 
problem solving attempt was then divided into five-second intervals, starting at a point ten 
seconds before the signal marking the point where the problem was given, and ccntinuing ten 
seconds past the signal showing that the solution had been reached. Each segment was then 
examined for any movement whatever, and the ‘active’ intervals (those in which any movement 
occurred) were marked in similar fashion to the method used in motility studies by Johnson (9, 
p. 255 f.) and by Garvey (7, p. 9). Analogous intervals in the several problems were then com- 
pared and percentages of active intervals among all those of a given type were worked out. 
Thus the percentages of active intervals out of the total of 201 problems considered were worked 
out for the two intervals just before and the two just following the presentation of the problem, 
the reaching of the solution, and the points where signals indicated that shifts occurred. Also, 
to obtain records not so closely related to these points, measurements were made of the four 
intervals following the two immediately after the presentation of the problem, and the last four 
immediately preceding the two just before the solution and not otherwise included. By the use 
of this method it was possible to construct a somewhat fictitious curve of a probably typical 
problem solving attempt, showing the average motility for all the subjects for the ten seconds 
just before and the ten just after the presentation of the problem, for the following twenty seconds 
(where no shift or solution signals occurred), for the ten seconds just before and the ten just after 
the first two shifts, for the last twenty seconds not included in either shift or solution, and for the 
ten seconds just before and the ten just after the signal showing that the problem had been solved. 
Inasmuch as the time taken to solve the various problems varied from problem to problem and 
from person to person, there was no better way of bringing the data together so as to show the 
effect of the different stages in the problem solving attempt. 

A curve was also constructed to show the relative amount of activity occurring in each quarter 
of the several problem solving attempts, not including the ten seconds just before or the ten just 
after the presentation of the problem or the point where the solution was reached. ‘This curve, 
then, disregards the effect of the shifts of attack on the problem and is concerned merely with the 
comparative amount of motility in successive periods of the problem solving attempt. 

In order to more easily deal with the data statistically the problem solving records were also 
read in another way. Ten-second intervals were marked off immediately before and immediately 
following the point where each problem was first given, immediately before and immediately after 
each signal indicating a change in attack, and immediately before and immediately after each 
point where a signal indicated that a solution had been reached. Then those areas of the record 


Sa 
ve 
ite 
~ 
» 
=! 
4+ 
ff / / 


374 ALAN D. GRINSTED 


which were not thus included in a ten-second segment were also divided off into ten-second in- 
tervals. Next the total number of pen oscillations within each of these intervals was counted, 
after the manner of Irwin’s treatment of motility records on infants (8, p. 95). Averages were 
obtained for each subject, one average for each of the types of interval marked. From these, 
averages were obtained for all subjects showing the number of pen oscillations in each of the types 
of interval for an average subject. 

The maze records were given less elaborate treatment. Each was divided into five-second 
intervals, and these intervals were each marked as to whether or not they were ‘active,’ that is, 
contained any movement whatever. Then each maze was divided into quarters, except that the 
interval in which the solution occurred was omitted in this division, and included with the two 
following it in a separate section altogether. Thus each maze was divided into five sections, the 
last including only the solution interval and the two following, and the other four each being made 
up of a quarter of the remaining record, starting where the action on the maze began. The 
percentage of active intervals in each of these sections was then calculated, and the data com- 
bined to obtain averages for all thirty-eight subjects. In this way it was possible to obtain a 
curve of the same sort as the second one mentioned above in connection with the problem solving 
part of the experiment. 


RESULTS 


Figure 1 shows the relative amount of motility that occurs in 
different stages of the problem solving process. ‘This curve shows 
the percentage of ‘active’ five-second intervals in each period and 


Periods 
—TPresertatn | Neutral | Shift | Shift | Neutral | Solition | 


% Active Intervals 


20 


Fic. 1. Motility curve in problem solving, showing effect of shifts. 


indicates that the least motility occurs in the ten-second interval 
just before the signal marking the start of the problem solving at- 
tempt. Since the reading of the problem actually began in this 
interval, one might expect movement to increase if the subject were 
‘getting set’ to solve the problem. But even the next ten seconds 


BODILY MOVEMENT AS RELATED TO PROBLEM SOLVING 375 


following the giving of the problem are nearly as low in percentage of 
active intervals. In the next twenty seconds we find increasing 
movement. ‘This is a period in which the person is presumably 
pushing toward a solution without any shifts in his mode of attack; 
however, since shifts were recorded only when S signified verbally 
that he was making a shift, it is reasonable to suppose that actually 
shifts did occur in such intervals and that such shifts may have 
accompanied the increased motility found here. The peaks of the 
curve are found at points just before shifts were known to have oc- 
curred and where the solution was reached. This last is highest of all. 


TABLE I 


Means, DIFFERENCES BETWEEN THE MEans, STANDARD Errors, AND CritTIcAaL Ratios 
FOR THE VARIOUS STAGES IN THE PROBLEM SOLVING PRocEss 


P . After Neutral Solution Before | After Before Last) After Last 
Stages with Means | problem | Period Period Shift | Shift Solution | Solution 
Before Problem | | | 
2.30+.286 524.385 | 3641.05 | 2.390+.483 | 1.37+.400 | .82+.720 | 1.464.601 | 3.56+.806 
Critical Ratio 14 4.95 3.43 1.14 2.11 4-42 
After Problem 
2.824.257 -16+.307 | 1.87+.466 | .85+.380}| .30+.709 | .94+.680 | 3.04+.706 
Critical Ratio «$2 4-01 2.24 42 1.38 3.82 
Neutral Period 
2.66 +.166 2.03 +.423 | 1.014.326 | .46+.681 | 1.104.651 | 3.204.771 
Critical Ratio 4.80 3.10 68 1.69 4.15 
Solution Period 
4.690 +.388 1.02 +.479 | 1.57+.766 | .93+.740 | 1.17 +.847 
Critical Ratio 2.73 2.05 1.20 1.38 
Before Shift 
3.67 +.280 552.717 | .09+.689 | 2.19+.803 
Critical Ratio 78 13 2.73 
After Shift 
3.12 +.660 644.912 | 2.74+1.002 
Critical Ratio -70 2.73 
Before Last Solution 
3.76+.629 2.10 +.680 
Critical Ratio 3.09 
After Last Solution 
5.86 +.753 
Critscal Ratio 


The data presented here suggest that while movement does occur 
along with problem solving, it occurs most frequently after the 
problem is solved or just before a verbal sign is given that the subject 
is abandoning an old attack to undertake a new one and can hardly, 
therefore, be thought of as being due to the assuming of postural sets. 
Two questions arise at this point, however: (1) do the movements at 
the point of solution mean, perhaps, that the subject is getting ‘set’ 
for a new problem? and (2) are these differences between intervals 
reliable? 

In order to clear up the point raised in the first question the last 
forty-five subjects used in the experiment were told definitely when 
the last problem was presented that it was to be the final one. Pre- 
sumably an increase in movement following the last problem, then, 
could not be due to an assuming of a new postural set. Measure- 


‘ 
a 
7 4 
: 
xd 
43. 
‘Toe 
4 
pose 
Aas 
- 
‘ 
a 


376 ALAN D. GRINSTED 


ments of the motility following the solution of the last problem by 
these subjects, as well as new readings of the records as described 
above under ‘Procedure,’ were treated statistically. The results 
are shown in Table I. Here it is seen that the amount of movement 
at the time of solution and after the last solution is significantly 
higher than either the periods just before or just after the presenta- 
tion of the problem or the periods which were apparently devoted to 
‘just plain thinking.’ Also, the period just before the signal showing 
that the subject had given verbal evidence of an abandonment of his 
mode of attack is significantly greater in motility than the period 


Periods 


Interval At So\'n 


Average Quarter 
Col'n Interval 
+ 60 
= 
z 
Quarters Of / 
"inside Intervals 
Probiem 
aze@ 
40) Problem ----- 


Fic. 2. Motility curve in solving problems and mazes, showing merely successive periods. 


just before the presentation of the problem. ‘This evidence seems to 
the writer to prove definitely that a person who gets ‘set’ mentally 
to solve a problem does not reflect that set posturally. 

Figure 2 shows the motility accompanying problem solving with- 
out regard for shifts, and superimposed on it is a similar curve for the 
motility accompanying the solving of a pencil maze. They both 
reach a low point in the middle, and their highest points are found at 
the points of solution. Since the mazes used were on the order of a 
spiral, blind alleys were encountered much more frequently, in point 
of time, at the start than after some progress had been made. Also, 
the nature of the maze situation was such as to show the subject that 
he was making progress. ‘This, of course, makes the maze situation 
quite different from that involved in the solving of the problems, 
where the longer the subject worked the more discouraged he was 


G0 


BODILY MOVEMENT AS RELATED TO PROBLEM SOLVING 377 


likely to become; and such differences might account for the difference 
in the amount of motility shown. ‘The increasing encouragement and 
the relatively decreasing number of 1mpasses may be reflected in the 
decreasing motility shown in the first part of the curve for the maze. 
The leveling off might be due to the reaching of a sort of normal 
level of motility, which is maintained until the end of the maze is 
reached. In the case of the problem solving, however, the more a 
subject works on a problem, the more likely he probably is to run 
into impasses over and over again, and if each one means a shift, 
then we have seen that such shifts accompany increased amounts of 
motility. Why there should be an increase in motility between the 
first and second quarters of the problem solving attempt is difficult 
tosay. Atany rateit may be said that the results of this study do not 
clearly agree with either of Clark’s statements mentioned above to 
the effect that ‘movement increased as thinking progressed”? and 
that the “‘number of movements tended to decrease as the thinking 
was prolonged.” 


Discussion AND CONCLUSIONS 


The writer feels that this experiment proves that the ‘postural 
set’ idea in problem solving is erroneous. If a person were to ‘get 
set’ to solve a mental problem or to assume a new mode of attack 
upon such a problem, it is reasonable to suppose that he would show 
increased motility just before undertaking a problem solving task 
or just before making each new attack upon the problem; but the 
experiment described above has shown definitely that a person does 
not show increased motility at such times. It is true that a person 
does move while performing such a task; but his movements come 
at the end of effort, not at the beginning. When the problem is 
solved, the greatest amount of movement occurs. When a shift in 
attack is made, movement occurs; but that movement occurs in 
connection with the abandonment of the old attack, rather than in 
getting set fora newone. ‘This is shown in the fact that in the experi- 
ment as reported the period before the old attack was definitely 
abandoned is significantly higher than either the period just before 
the presentation of the problem or the neutral periods. The fact 
that the movement occurring after the solution of the problem is not 
due to the subject’s getting set for a new problem is shown in the 
fact that such movement occurred even when the subject knew that 
he had solved the last problem he was to be given. 

What, then, is the explanation of these movements’? Perhaps 
a key to the answer to this question is to be found in H. M. Johnson’s 
explanation of the motility that occurs in sleep. He says, 


>. 
Baty 
=, 
~ 
= 


378 ALAN D. GRINSTED 


The longer a person lies still, the more evident it is that he is asleep to a very important 
group of disturbances, operating upon, and inside, his skin. As soon as he assumes any given 
position in bed, certain conditions begin to build up, which presently become irritating. A 
large area of skin is in close contact with the mattress-covering and a smaller area immediately 
in contact with the body-coverings, so that cooling by ventilation is prevented. All this 
skin grows warmer and warmer until it reaches a temperature very near to that of the in- 
terior of the body. Moreover, the blood and other bodily fluids tend to gravitate to the 
parts which are the lowest and there settle, while pressing upon the bodily tissues. The 
visceral organs themselves are movable and some press upon others, while straining the 
membranes by which they are attached to the walls of the trunk. The body as a whole 
presses upon the skin and muscles next to the mattress, and thereby restricts their blood 
supply. Some muscles are under tension in maintaining posture, while some joints are 
cramped. A muscle, even when relaxed, becomes irritable merely from being kept still. . . . 
All these irritating conditions increase with time, and normally produce a change of bodily 
position, by which they are relieved. Thereupon, the subject can rest until similar condi- 
tions are built up in other regions. To lie still for a considerable time requires a disregard 
of present irritation, or a condition of sleep with respect to it. (9, p. 251 f.) 


It does not seem unreasonable to suppose that the motility of the 
waking person can be explained in similar fashion to the above ex- 
planation of the motility of the sleeping person. As Johnson points 
out, the person who moves while ostensibly asleep is attending to 
particular stimuli—stimuli operating upon, and inside, his skin. 
When outside stimuli regain his attention, he awakens. Thus, 
motility in sleep is positive and purposive. ‘The writer believes that 
the condition in the case of the person who is not sleeping is similar. 
It rather seems reasonable to suppose that just as attention is a 
factor in the motility of the sleeping person, so it is in the case of a 
person solving mental problems. ‘The person while solving the prob- 
lem is attending to that job, and is ‘asleep’ to certain other stimuli 
playing in upon him. After the problem is solved, he attends to the 
job of adjusting his body to the strains that his lack of motion so far 
has put him to. Similarly, there may be times during the solving of 
the problem that he may move because the tensions building up 
within his body have become so great that they have come to demand 
his attention. In other words, one’s attention is turned to the pre- 
potent stimulus of the moment, and the stimulus that is prepotent 
at the moment may not be prepotent at another moment, if another 
stimulus (in this case a tension) becomes great enough to demand the 
attention. So, if a person sits relatively still until he gets to a solu- 
tion, a partial solution, or an impasse in his problem, a resultant 
lagging in interest, either permanent or temporary, permits his 
bodily tensions to become more important, relatively, than they 
formerly were. His attention turns to them, and he moves. ‘The 
results of investigations by Ovsiankina and Zeigarnik (11, p. 254 f.) 
imply strongly that the unfinished task holds the attention more 
strongly than that which is finished. Thus we should expect that the 
task which is solved would show more movements than occur when 
the task is unfinished or unsolved. We should expect also, then, 


» 
\ 
2 
| 
‘ 


BODILY MOVEMENT AS RELATED TO PROBLEM SOLVING 379 


that when one of the subjects in this experiment reaches an impasse 
he should move more than when active in his pursuit of a solution, 
but less than he would when the problem is actually solved and 
finished. Inspection of the results indicates that this is probably 
true, although the differences between the shift periods and the solu- 


tion periods are not quite large enough to be judged as statistically 
reliable. 


Summing up, it can be said that the results of this experiment 
support Clark’s observation that the movements of a person trying 
to solve problems ‘‘seemed to have no bearing on the solving of the 


various problems but was simply a bodily adjustment for the better 
concentration of attention.” 


(Manuscript received April 9, 1941) 


REFERENCES 


1. Bonser, F. G., A study of the relations between mental activity and the circulation of the 
blood, Psychol. Rev., 1903, 10, 120-138. 
2. Burtt, H. E., Motor concomitants of the association reaction, Psychol. Bull., 1934, 31, 
671-672. 
3. Cuark, R. S., An experimental study of silent thinking, Arch. Psychol., 1922, 7, 5-101. 
4. CourTEn, H. C., Involuntary movements of tongue, Yale Psychol. Studies, 1902, 10, 93-96. 
5. Curtis, H. S., Automatic movements of the larynx, Amer. J. Psychol., 1900, 11, 237-239. 
6. GamBLE, E. A. M., Attention and thoracic breathing, Amer. J. Psychol., 1905, 16, 261-292. 
7. Garvey, C. R., The activity of young children during sleep: an objective study, Institute of 
Child Welfare, Monograph No. 18, Minneapolis: Univ. Minn. Press, 1939. 
8. Irwin, O. C., Activities of newborn infants during the first ten days of life, in Readings in 
experimental psychology, edited by W. L. Valentine, New York: Harper & Bros., 1931. 
g. Jounson, H. M., Sleep, in Readings in experimental psychology, edited by W. L. Valentine, 
New York: Harper & Bros., 1931, 241-291. 
10. Jounson, H. M., & Swan, T. H., Sleep, Psychol. Bull., 1930, 27, 1-39. 
11. Lewin, K., 4 dynamic theory of personality (trans. by D. K. Adams & K. E. Zener), New York: 
McGraw-Hill, 1935. 
12. PintNneR, R., Inner speech during silent reading, Psychol. Reo., 1913, 20, 129-153. 
13. Pircuarp, E. A. B., The electromyogram of voluntary movements in man, Brain, 1930, 53, 
344-375. 
14. RensHaAw, S., & Weiss, A. P., Apparatus for measuring changes in bodily posture, Amer. 
J. Psychol., 1926, 37, 261-267. 
15. Scneck, M. G., Involuntary tongue movements under varying stimuli, Proc. Iowa Acad. Sci., 
1925, 32, 386-391. 
16. Storrinc, G., Experimentelle untersuchungen iiber einfache Schussprogesse, Arch. ges. 
Psychol., 1908, 9, 1-127. 
17. TERMAN, L. M., & Merritt, M. A., Measuring intelligence, New York: Houghton Mifflin 
Co., 1937. 


‘ 
: 
x, 
» 
bat 


STUDIES OF ABNORMAL BEHAVIOR IN THE RAT. 
VII. THE PERMANENT NATURE OF ABNORMAL 
FIXATIONS AND THEIR RELATION TO 
CONVULSIVE TENDENCIES! 


BY NORMAN R. F. MAIER AND JAMES B. KLEE 


INTRODUCTION 


Maier, Glaser and Klee (6) have shown that some rats develop 
strong position habits in certain frustrating situations (a preliminary 
period of failure to solve) which require a choice between a pair of 
cards. These habits became so persistent that the animal failed to 
choose the positive in preference to the negative card in the Lashley 
jumping apparatus (2) despite the fact that the animal associated 
punishment with one and not with the other. The presence of an 
acquired differentiation between the cards was shown by the fact 
that when the positive card was on the side of the position preference 
the animals jumped normally and readily, but when the negative 
card was on the position side the animals jumped abortively and 
showed resistance. The authors regarded the persistent habit as an 
abnormal fixation since the habit was not disturbed by differential 
reward and punishment. 

In a few cases experiments were performed to determine means 
for breaking the fixated response and it was found that guidance 
(manual interference with jumping to the position side) was effective. 
As soon as the position response was broken such animals showed a 
perfect discrimination between the positive and negative cards. 

It was also found that individuals differed in their predisposition 
to form position fixations. In the two frustration situations 12 
animals developed fixations whereas 9 did not. (A non-frustration 
situation produced one fixation in 10 rats.) The animals could be 
classified into two groups: those who have fixation tendencies and 
those not having them. 

The present investigation deals with continued experiments with 
the above animals (1) to determine the degree of permanence of the 
fixations and (2) to find whether the individual differences were 
associated with convulsive tendencies.” 


1 This study is part of a research program supported by a grant to the senior author from 
the John and Mary R. Markle Foundation, New York City. 

2Maier and Glaser (5) found an important hereditary factor in the abnormal seizures 
associated with auditory stimuli and it is possible that this same factor is related to fixation 


tendencies. 


380 


STUDIES OF ABNORMAL BEHAVIOR IN THE RAT 381 


PROCEDURE 


To determine the persistence of the fixations a long period of testing was undertaken. To 
prevent escape from the situation (see Maier (3)) the yumping apparatus was altered in that the 
jumping platform was enclosed by transparent walls on all sides except the front. As in the 
previous investigations air was applied to force a response. The application of air was in stages 
as follows: (1) no air for thirty seconds, (2) mild air blast for thirty seconds, (3) medium air blast 
for thirty seconds, and (4) full air blast (10-14 pounds) for the remainder of the resistance period. 

The experiences to which the animals were subjected were as follows: 


A. Vacation, four months or more. On completion of the previous experiment the animals 
remained in their cages and received no tests of any kind. 

B. Discrimination problem, ten days with ten trials per day. The problem was the same 
as used in the earlier experiment. The positive card consisted of a black circle on a white back- 
ground and the negative card, of a white circle on a black background. All non-fixated animals 
had learned this problem and all fixated animals responded on the basis of a position habit. 

C. One-window situation, ten days with ten trials per day. Maier (3) has shown that this 
situation contains an element of conflict since only one card is presented and the animal must 
respond to it regardless of whether it is the positive or the negative card. 

D. Vacation, twelve days. (Procedure as in condition 4.) 

E. One-window situation, five days. (Procedure as in condition C.) 

F. Mixed series, thirty days with ten trials per day. In this series the three following test 
conditions were utilized: one day, the discrimination problem; the next, the two-window situation 
with identical cards (either two negative or two positive cards), and the third, the one-window 
problem. In thirty days each of the procedures was used for ten days. 

G. Test period and metrazol injections, thirty-two to forty-nine days. Daily tests were 
made on the discrimination problem in the case of fixated rats and the one-window problem in 
the case of non-fixated rats during the afternoon, and metrazol injections, administered intra- 
peritoneally (for details see Maier and Sacks (8)), were given on the mornings of certain days. 
Until the convulsive threshold was determined the injections were given at weekly intervals 
(to avoid development of tolerance) and thereafter on alternate days. The number of injections 
varied from four to thirteen because some of the animals died during convulsions. An average 
of 9.6 injections was given and these produced an average of 5.7 convulsions (range I-10). 

Animals Used.—Of the 31 rats used in the previous experiments complete data for the 
present experiment were obtained from 21 rats, 10 of these were fixation animals and 11, non- 
fixation animals. 


RESULTS 


I. On the Permanence of the Behavior Tendencies of Fixated and 
Non-fixated Animals 


Of the 10 fixated rats, 7 completely retained their position habit 
fixations and their pattern of abortive jumping throughout the test 
program despite convulsive experience produced by metrazol and 
by other factors in the situation. ‘Three rats lost their fixation, 2 
during the first discrimination period (condition B) and 1 during the 
mixed series (condition F). Of the 3 rats which lost the fixation 2 
did so permanently. The other rat formed a discrimination habit 
during condition B, reverted to the former position habit during 
condition F, continued the position response for ninety trials and 
returned to the discrimination habit at the beginning of condition G. 

Despite the fact that the battery of tests favored the formation 
of a discrimination response this group tended to continue the un- 


Cypha 


| 
be 


382 NORMAN R. F. MAIER AND JAMES B. KLEE 


adaptive position habit. The results therefore demonstrate the 
strikingly permanent nature of position fixations. 

The 11 non-fixated rats may be divided into two groups on the 
basis of their previous experience. Group A consisted of 6 rats which 
had been frustrated in the same manner as the above-mentioned 
fixated group, but which did not develop the persistent position 
habits. Instead, they successfully learned the discrimination prob- 
lem when this was required. Group B consisted of 5 rats which were 
trained to form a position habit. When the problem was changed 
and a discrimination response was demanded, these animals readily 
displaced their position habits with the discrimination habit. Both 
groups thus are alike in that they entered the present experiment with 
a discrimination response and without a position fixation. 

All of these rats regained their discrimination habit during condi- 
tion B and made an average of 5.2 errors before again attaining per- 
fect discrimination. Only 1 of the rats abandoned this mode of 
response by developing a position habit during the mixed series 
(condition F). This positional response was retained for ninety 
successive trials and was then displaced by the former discrimination 
response. 

Since the discrimination response was not directly punished in 
the tests used in the present study its stability might be anticipated. 
However, it is of interest to note that the stressful factors encountered 
failed to develop even temporary changes in the behavior tendencies 
of 10 of the 11 animals. 

Thus despite the rather strenuous and varied experiences the 
majority of animals tended to express the same behavior patterns 
they had expressed during the first investigation. 

The fact that metrazol failed to disturb the learned patterns 
(some of which were abnormal adjustments) in any observable manner 
is inconsistent with any theory which makes the therapeutic effect 
of metrazol dependent upon a disruption or disorganization of past 
experience. 


II. Relation between ‘Neurotic’? Attacks and Fixation Tendencies 


In the previous experiment the rats used in the present study 
failed to show attacks in the discrimination problem. That some of 
them did so in the present study is probably due to the addition of 
the enclosed box to the jumping platform since Maier and Glaser (4) 
found this to be a factor. 

It was found that 5 of the 1o fixated rats and that 6 of the I! 
non-fixated rats were susceptible to attacks. Since susceptible 
individuals occur in about equal numbers in the two groups, it 


STUDIES OF ABNORMAL BEHAVIOR IN THE RAT 383 


appears that the predisposition to develop fixations and to show 
‘neurotic’ seizures are unrelated factors. 

However, the groups also show certain differences. In Table 1, 
the number of attacks per day and the number of rats having attacks 
are given for each of the test conditions. It will be seen that the 
average number of rats having seizures in each test condition is 2.8 
for the fixated group and 4.5 for the non-fixated group. ‘The fre- 
quency with which the susceptible animals react is also different in 
the two groups. In all test situations the 5 susceptible rats in the 
fixated group showed a total of forty seizures (an average of 8 per rat) 
or an average of .47 attacks per test day, whereas the 6 susceptible 
animals of the non-fixated group showed a total of one hundred 


TABLE 1 
RELATION BETWEEN ‘Neurotic’ ATTACKS AND FIXATION TENDENCIES 
Ten Fixated Rats Eleven Non-fixated Rats 
Test Situation No. of Rats No. of Rats 
Attacks per Day Having I or Attacks per Day Having 
More Attacks Attack 
B. Discrimination 
(after vacation)...... 1.6 5 2.1 4 
C. One-window........ 6 3 2.3 5 
D. One-window 
(after vacation)...... 2.0 5 3.6 5 
F. Mixed series 
Discrimination..... x | I 5 3 
One-window....... 4 2 2.8 6 
2 Identical cards... 2 I 2.8 4 
47 2.8 2.4 4:5 


twenty-five seizures (an average of 20.8 per rat) or an average of 2.4 
attacks per test day. The difference in the two groups is largely due 
to the fact that the fixated group shows a falling off in seizures as 
testing continues, whereas the non-fixated group shows no such trend. 
It appears that the fixated group develops some kind of adjustment 
and it is reasonable to suppose that the fixation furnished this adjust- 
ment. This possibility was suggested by Maier (3) since he believed 
that the persistent position response gave the animal a mode of 
behaving in a conflict situation. 

That a vacation period is a factor of some importance can be seen 
by comparing conditions C and D. After a vacation period the one- 
window situation produced more attacks per day than before, for 
each of the groups. It is perhaps the vacation that causes the first 
discrimination period (condition B) to be as effective as it is. 

In the mixed series the vacation factor is excluded and the effec- 
tiveness of the three test conditions may be compared under like 


: 
> 
> 
~~ 
4 


384 NORMAN R. F. MAIER AND JAMES B. KLEE 


circumstances. For the non-fixated rats the discrimination problem 
produced only .§ attacks per day whereas the one-window and the 
two identical card tests produced an average of 2.8 attacks each, per 
test day. It is also of interest to note that the discrimination prob- 
lem produced attacks in only 3 rats whereas the one-window situation 
produced them in 6 animals. Both the one-window and the identical 
card situations may be expected to be disturbing to non-fixated rats 
since both require the animal to choose the negative card when it is 
presented. 

In the mixed series none of the situations is very effective for 
producing attacks in the fixated rats although the one-window situa- 
tion is somewhat more effective than the other two. It is in this 
situation only that a positional response cannot be expressed. How- 
ever, the discrimination situation as well as the other two contains 
an element of conflict for these rats since the negative card is on the 
side of position preference in half the trials and on such trials the two- 
window problem is functionally a one-window situation. This may 
explain why the type of situation is less of a differentiating factor in 
this group than in the non-fixated group. 

It appears that the mixed series interferes with the development 
of adjustments for the non-fixated group. The animals of this 
group react differentially to the stimulus cards and therefore have 
difficulty in developing a consistent mode of behaving. The fixated 
rats, however, react less to the cards and more to position. This 
latter mode of behaving is not greatly disrupted when the series is 
mixed and, as a consequence, the adjustment can proceed. In such 
cases the one-window situation would be the most disturbing of the 
three conditions, but even in these cases the animal may respond on a 
position basis by jumping to the right or the left of the stimulus card. 
One rat showed this very clearly. Whenever one card was exposed, 
this rat showed exactly the same kind of right position response as 
when two windows were present thus causing him to entirely miss 
the window. 

To further analyze the part played by the situation one may de- 
termine whether the seizures were associated with occasions when the 
rats were forced to jump to the negative or to the positive card. 

Since the fixated group always responded on the basis of position 
(despite the fact that they differentiated the cards) we may use the 
data in all situations, but since the non-fixated group usually chose 
the positive card in the discrimination problem we will use only the 
data from situations which forced them to choose either the positive 
or negative card. 

The results of this analysis are shown in Table 2. It will be seen 
that the fixated group had 3.4 times as many attacks when forced 


STUDIES OF ABNORMAL BEHAVIOR IN THE RAT 385 


to the negative card as when forced to the positive card. For the 
non-fixated group the ratio is 7.1 or more then twice as great. Even 
when the total data for the non-fixation group are used the ratio is 
3.3. (Despite the fact that the negative card is seldom chosen in the 
discrimination situation by the non-fixated group, it is associated 
with as many seizures as is the positive card. As a matter of fact 
nearly all of the jumps to the negative card were associated with 
attacks.) 


TABLE 2 
DISTRIBUTION OF SEIZURES 


Total Number of Attacks 
Fixated Rats Non-fixated Rats 


Forced to jump to negative card.....................31 85 

Forced to jump to positive card................0.00. 9 12 
. Negative card 

Ratio 


7.1 


Since the non-fixated rats were relatively more affected by the 
negative card than were the fixated rats the role of conflict again 
becomes apparent. The element of conflict is most obvious when 
the negative and positive stimulus cards are differentiated and we 
have already found this differentiation to be dominant in the non- 
fixated group, but subservient to the position habit in the fixated 
group. 

Certain rats may be singled out for special consideration. ‘These 
animals reached a point where they entirely refused to jump to either 
the negative or the positive card. When a full blast of air failed to 
cause a response after five minutes they were tapped and pushed 
vigorously with a stick until a response occurred. ‘Three animals 
(all litter mates and non-fixated) had attacks under these circum- 
stances and their attacks occurred primarily when air and pushing 
were used to force a response to the negative card. For example, 
during the mixed series these rats had a total of one attack when air : 
alone was used to force a response to the positive card, a total of four : 
attacks when both air and pushing were used to force a response to 
the positive card, and a total of thirty-eight attacks when air and 
pushing were used to force a response to the negative card. 

These results can hardly be explained by saying that the air 
produces the seizures. Rather a number of factors must operate 
together and these become clear only when we grant the role of con- 
flict. In all instances the duration of the air stimulation was more 
than adequate to produce seizures but it was only when the re- 
sistance was broken by an additional agent that seizures occurred in 
any number. 


a 


386 NORMAN R. F. MAIER AND JAMES B. KLEE 


Three fixated animals were likewise pushed but they failed to 
have attacks and merely jumped abortively after the resistance was 
broken. Whether these animals had compensating adjustments or 
whether the effective manner in which resistance is broken is an 
individual matter cannot be said. Certainly the mere overcoming 
of resistance to jumping does not guarantee a seizure, otherwise 
electric shock would be an effective manner for producing them 
(Maier (3)). 

Retiring cage behavior, which Maier (3) found to be associated 
with neurotic attacks, was also observed. Of the 11 rats having 
seizures during these experiments 8 showed it to a marked degree 
and to the end of the experimental period. Of the 1o rats having no 
attacks, none showed this behavior, not even after metrazol was used 
to produce convulsions in them. 


Ill. The Effect of Metrazol on the ‘Neurotic’ Pattern 


We have already pointed out that metrazol was administered 
to the rats in order to determine whether it would alter the acquired 
behavior tendencies of the animal. We wish now to determine 
whether or not it affected the occurrence of attacks in the psycho- 
logical tests. For this analysis we are limited to animals which 
survived several injections and which showed the ‘neurotic’ pattern ® 
either previous to or during the period of metrazol administration. 
In order that the effect of metrazol on the frequency of ‘neurotic’ 
attacks may be determined we have compared the metrazol test period 
with a previous period of similar length and nature. 

During the metrazol period the Io rats had a total of three hun- 
dred and nineteen test days (average 31.9), received a total of one 
hundred injections, and showed a total of sixty-one metrazol con- 
vulsions. During this period a total of forty-seven ‘neurotic’ 
attacks occurred in the situation as compared with twenty-seven 
which occurred during an equivalent period of three hundred and 
eleven test days. Of the 10 rats, 6 had ‘neurotic’ attacks before as 
well as during the metrazol test period, 2 had attacks previously, 
but not during the period, and 2 animals showed their first attacks 
(a total of three) during the metrazol test period. 


3 Since seizures were produced by metrazol as well as in the jumping apparatus we here 
refer to the seizures produced by metrazol as convulsions and those produced in the jumping 
apparatus as ‘neurotic’ attacks. Although these two types of seizures have certain differences 
and certain similarities (Maier and Sacks (7) and (8), we do not wish the above terminology to 
emphasize the importance of the difference in pattern. Rather the two terms are used here 
merely to avoid confusion in the discussion of results. Both types of reactions may be convulsive 
in nature and the terminology should not be interpreted as diagnostic or as related to human 


abnormalities. 


—— 


STUDIES OF ABNORMAL BEHAVIOR IN THE RAT 387 


The data have been analyzed to determine whether the increase 
in the frequency of attacks varied with the number of injections 
or with the number of metrazol-produced convulsions. No accumu- 
lative effect, however, was found. For example, when the test 
period is divided into four parts and attack-frequency is expressed 
in terms of the ratio between the total number of attacks and the 
total number of days tested, the group figures for the four sub-periods 
are respectively, .15, .13, .15, and .18. Other comparisons gave 
similar results. 

When the data are analyzed to determine the proportion of attacks 
which occurred when the animal was forced to respond to the negative 
and when forced to respond to the positive card, a further difference 
between the metrazol and the control periods becomes apparent. 

The ratio between the number of seizures to the positive and 
negative cards for the metrazol period is 11 : 36, and for the control 
period, 1:26. The metrazol period thus tends to increase the 
number of attacks to each of the cards by the same absolute amount 
(ten attacks) and thus reduces the relatively greater effectiveness of 
the negative card. 

Since metrazol was not given daily we may compare the results 
of injection days with non-injection days. When this is done we find 
that fifteen ‘neurotic’ attacks occurred on one hundred injection 
days (15%) and thirty-two occurred on two hundred and _ nine- 
teen non-injection days (14.6%). Thus, the frequency of attacks 
for injection and non-injection days is strikingly the same. In 
terms of the positive and negative cards, however, some difference is 
apparent. The number of attacks in the positive and negative 
card situations are five and ten, respectively (ratio 1:2), for the 
injection days, and six and twenty-six (ratio 1 : 4.3) for the non- 
injection days. This is consistent with the previously noted finding 
that metrazol reduces the relative frequency of attacks to the nega- 
tive card. 

Since some metrazol injections produced convulsions and others 
did not we may also compare the numbers of ‘neurotic’ attacks occur- 
ring under these two conditions. When this is done we find that 
eight of the fifteen ‘neurotic’ attacks occurred on days on which the 
metrazol produced convulsions and seven occurred on days when it did 
not produce them. ‘Thus the metrazol rather than its convulsion 
seems to be the determining factor. However, the eight attacks 
occurring on days when the metrazol produced convulsions were 


‘shown by 3 rats, whereas the seven, occurring on non-convulsing 


days were shown by 5 rats, two of which had never before reacted to 
the psychological situation. In terms of the number of rats affected 


: 


2 
ton, 
. 


388 NORMAN R. F. MAIER AND JAMES B. KLEE 


it would seem that the non-convulsing dose of metrazol is more 
disturbing psychologically than a convulsing dose. 

The results of the metrazol test period can be eutend if we 
suppose that the drug produces a temporary psychological state 
which makes the animal more susceptible to certain auditory irritants. 
That there is a physiological basis for such a psychological state is 
shown by experiments of Gellhorn and Darrow (1) who found that 
metrazol (either with or without convulsive manifestations) 
creased reflex sympathetic excitability. Forcing the animal to choose 
the negative rather than the positive card produces similar tensions 
and thus we may account for this situation’s effectiveness for rats 
when not given the drug. 


SUMMARY AND CONCLUSIONS 


1. Rats which previously had developed fixations in a frustrating 
discrimination problem tended to retain these fixations throughout 
another battery of tests involving both one-window and two-window 
procedures. Rats, which failed to develop position fixations in the 
previous experiment either with or without the element of frustration, 
tended neither to develop them nor to regress to them in the new test 
program. Since the test program was designed to be stressful in that 
it contained conflict and convulsion producing factors as well as 
punishment for positional responses the failure to disturb the position 
fixations in a majority of the rats demonstrates their strikingly 
permanent nature. In order to account for the appearance and non- 
appearance of fixations it is not only necessary to recognize the part 
played by the situation but also the importance of individual sus- 
ceptibility to fixations. 

2. The tendency to develop fixations and the tendency to show 
‘neurotic’ attacks in certain test situations (which utilize the hiss 
of air to force a response) seem to be unrelated tendencies. However, 
fixated rats show a greater reduction in the frequency of attacks as 
testing is continued. It appears that rats with fixations are better 
able to adjust to the attack-producing situation, indicating that the 
fixation is a form of adjustment to the frustrating conditions. 

3. Throughout the experiments, attacks tend to be associated 
with the situation which requires the rat to jump to the negative 
training card rather than one which forces a jump to the positive 
card (the hiss of air being present in both). Since the requirement of 
choosing the negative card is a form of conflict from which there is no 
apparent escape, the view that conflict is one of the factors in the 
production of attacks gains further support. In three animals 
seizures were produced when their resistance was broken by the 
addition of manual pushing with a stick. 


< 
- 
‘ 
‘ 
4 


e 


S 


Om Dd 


O 
e 
S 
e 


STUDIES OF ABNORMAL BEHAVIOR IN THE RAT 389 


4. Repeated injections of metrazol (about half of which pro- 
duced convulsions) had no effect on fixations or abortive jumping, 
nor did they in any way alter the behavior in the jumping apparatus. 
Metrazol injections did, however, appear to have some effect on the 
occurrence of seizures in the jumping apparatus. ‘Two rats which 
had failed to show seizures otherwise, did so after metrazol injections. 
Two rats which previously showed them did not do so during the 
metrazol period, and 6 showed them during both periods. That 2 
new rats were caused to have attacks is more significant than failure 
to find them in 2 previously susceptible animals since adaptation is 
fairly common whereas the opposite is very rare. Metrazol also 
resulted in a 74 percent increase in the frequency of the attacks and 
reduced the difference between the relative effectiveness of the 
positive and negative card situations. 

No accumulative effect of the metrazol was noted. 

The effect of the metrazol seems to be one of producing a tem- 
porary psychological condition which increases the animals’ ir- 
ritability. 


(Manuscript received April 23, 1941) 


REFERENCES 


1. GELLHORN, E., & Darrow, C. W., The action of metrazol on the autonomic nervous system, 
Arch. Internat. Pharmacodynamie et de Therapie, 1939, 62, 114-128. 

2. Lasnuey, K. S., The mechanism of vision. I. A method for rapid analysis of pattern vision 
in the rat, J. genet. Psychol., 1930, 37, 453-460. 

3. Marer, N. R. F., Studies of abnormal behaoior in the rat. I. The neurotic pattern and an 
analysis of the situation which produces it, New York, Harper and Bros., 1939, pp. 81. 

4. Marer, N. R. F., & Guaser, N. M., Studies of abnormal behavior in the rat. II. A com- 
parison of some convulsion producing situations, Comp. Psychol. Monog., 1940, 16, 30. 

5. Mater, N. R. F., & Guiaser, N. M., Studies of abnormal behavior in the rat. V. The 
inheritance of the ‘neurotic pattern,’ J. comp. Psychol., 1940, 30, 413-418. 

6. Mater, N. R. F., Guaser, N. M., & Kee, J. B., Studies of abnormal behavior in the rat. 
III. The development of behavior fixations through frustration, J. exp. Psychol., 1940, 
26, 521-546. 

7. Mater, N. R. F., & Sacks, J., Metrazol induced convulsions in normal and ‘neurotic’ 
rats, Film, Psychol. Cinema Register, Bethelhem, 1940, 350 ft, 16 mm. 

8. Marer, N. R. F., & Sacks, J., Studies of abnormal behavior in the rat. VI. Patterns of 
convulsive reactions to metrazol, J. comp. Psychol. (In press). 


t 
e 
=" 
| 
t 
- 
S 
g 
t 
N 
e 
~ 
t 
| 
S 
n 
Bay 
y 
t 
V 
AZ 


SOME QUANTITATIVE PROPERTIES OF ANXIETY 


BY W. K. ESTES AND B. F. SKINNER 


University of Minnesota 


Anxiety has at least two defining characteristics: (1) it is an 
emotional state, somewhat resembling fear, and (2) the disturbing 
stimulus which is principally responsible does not precede or accom- 
pany the state but is ‘anticipated’ in the future. 

Both characteristics need clarification, whether they are applied 
to the behavior of man or, as in the present study, to a lower organ- 
ism. One difficulty lies in accounting for behavior which arises in 
‘anticipation’ of a future event. Since a stimulus which has not yet 
occurred cannot act as a cause, we must look for a current variable. 
An analogy with the typical conditioning experiment, in which §,, 
having in the past been followed by Se, now leads to an ‘anticipatory’ 
response to Se, puts the matter in good scientific order because it is a 
current stimulus S,, not the future occurrence of Se, which produces 
the reaction. Past instances of S2 have played their part in bringing 
this about, but it is not S, which is currently responsible. 

Although the temporal relationships of classical conditioning pro- 
vide for an acceptable definition of anticipation, the analogy with 
anxiety is not complete. In anxiety, the response which is developed 
to S, need not be like the original response to Sx. Ina broader sense, 
then, anticipation must be defined as a reaction to a current stimulus 
S, which arises from the fact that S,; has in the past been followed by 
Ss, where the reaction is not necessarily that which was originally 
made to Sy. The magnitude of the reaction to S; at any moment 
during its presentation may depend upon the previous temporal 
relations of S, and Sz. 

The concept of ‘emotional state’ also needs clarification in view 
of the experiments to be described. It has been suggested elsewhere 
(3) that in treating emotion purely as reaction (either of the auto- 
nomic effectors or of the skeletal musculature), a very important 
influence upon operant behavior is overlooked. In practice we are 
most often interested in the effect of a stimulus in altering the strength 
of behavior that is frequently otherwise unrelated to the emotion. 
A stimulus giving rise to ‘fear,’ for example, may lead to muscular 
reactions (including facial expression, startle, and so on) and a wide- 
spread autonomic reaction of the sort commonly emphasized in the 
study of emotion; but of greater importance in certain respects is the 
considerable change in the tendencies of the organism to react in 


390 


© 


SOME QUANTITATIVE PROPERTIES OF ANXIETY 391 


various other ways. Some responses in its current repertoire will be 
strengthened, others weakened, in varying degrees. Our concern is 
most often with anxiety observed in this way, as an effect upon the 
normal behavior of the organism, rather than with a specific supple- 


mentary response in the strict sense of the term. Be 
The experiments to be described follow this interpretation. An - 
emotional state is set up in ‘anticipation’ of a disturbing stimulus, 2 


and the magnitude of the emotion is measured by its effect upon the 
strength of certain hunger-motivated behavior, more specifically 
upon the rate with which a rat makes an arbitrary response which is 
periodically reinforced with food. Such a rate has been shown to bea 
very sensitive indicator of the strength of behavior under a variety 
of circumstances (1), and it is adapted here to the case of emotion. 


Mowrer’s recent summary of techniques for measuring the ‘expecta- re 
tion’ of a stimulus does not include a comparable procedure (2). oa 
; In these experiments the disturbing stimulus to be “anticipated” was an electric shock a 
’ delivered from a condenser through grids in the floor of the experimental box. The stimulus o 
which characteristically preceded the disturbing stimulus and which therefore became the occasion : e3 
t for anxiety was a tone, produced by phones attached to a 60 cycle A.C. transformer. = 
; ‘The apparatus, which provided for the simultaneous investigation of twenty-four rats, has Y 
, been described in detail elsewhere (1, 3). Each rat was enclosed during the experimental period 
in a light-proof and nearly sound-proof box containing a lever which could be easily depressed. 


A curve (number of responses vs. time) for each rat and mechanically averaged curves for the 
, group and for certain sub-groups of six or twelve rats were recorded. Under the procedure of 
periodic reconditioning, the control clock was set to reinforce single responses to the lever every 
four minutes, intervening responses going unreinforced. The rats came to respond at a relatively 
constant rate during the one-hour experimental period, and the summated response curves tended 
to approximate straight lines, except for local cyclic effects resulting from a temporal discrimina- 
tion based upon the four-minute period of reinforcement. Curves 4 and C in Fig. 4 are for groups 
of twenty-four rats and represent the sort of baseline available for the observation of the effect 
of anxiety. 
. The subjects were twenty-four male albino rats under six months of age, taken from an 
unselected laboratory stock. Records were taken for one hour daily during the entire experiment. 
After preliminary conditioning of the pressing response, two sub-groups were formed; one group of 
twelve rats was kept at a relatively high drive, while the other twelve were held at a drive which 
produced a very low rate of responding. The sound and shock were first introduced after two 
weeks of periodic reinforcement. 


CONDITIONING OF A STATE OF ANXIETY 


The averaged periodic curve for twelve rats on a high drive on the 
occasion of the first presentations of the tone (7) and shock (S) is 
shown in Fig. 1. On this first presentation the tone was allowed to 
sound for three minutes. Each rat was then given a shock and the 
tone was stopped. It will be observed that neither the tone nor the 
shock (at the intensity used throughout the experiment) produced 
any disturbance in the mean periodic rate at either presentation. 
This orderly base-line made it possible to follow with ease the de- 
velopment of the ‘anticipation’ of the shock during subsequent 
repetitions of the situation. 


ai 
: 
as 
> 
— 
, 


392 W. K. ESTES AND B. F. SKINNER 


The tone-shock combination was presented twice during each of 
six consecutive hourly periods. Then, in order to clarify any changes 
in the behavior, the period of the tone was lengthened to five minutes 
and the combination was given only once during each ensuing ex- 
perimental hour. 

The principal result of this part of the experiment was the condi- 
- tioning of a state of anxiety to the tone, where the primary index was 
a reduction in strength of the hunger-motivated lever-pressing be- 
havior. ‘The ratio of the number of responses made during the period 
of the tone to the average number made during the same fraction of 
the hour in control experiments was 1.2 : 1.0! for the first experi- 
mental hour; it had dropped to 0.3 : 1.0 by the eighth. 


RESPONSES 


TIME (ONE HOUR) 


Fic. 1. First presentations of tone and shock. Mechanically averaged curves for twelve 
rats under periodic reinforcement. The tone was turned on at T and at S the shock was ad- 
ministered and the tone turned off. There is no noticeable effect of either tone or shock upon the 
rate of responding at this stage. 


The changes in behavior accompanying anticipation of the shock 
are shown in Fig. 2, which gives the averaged curves for the group of 
six rats with the highest periodic rate during the first four days of the 
five-minute tone. A number of characteristics of these records should 
be noted. The progressively more marked reduction in periodic 
rate during the anticipatory period is obvious. The effect upon the 
rate is felt immediately after the presentation of the tone and remains 
at a constant value until the shock is given. (This constancy might 
not be maintained if the situation were repeated often enough to 
allow the rat to form a temporal discrimination.) Effects also appear 


1 The ratio is not expected to be exactly 1 : 1 since the number of responses made during a 
period of five minutes will depend upon where the period begins with respect to the four-minute 
interval of reinforcement. 


SOME QUANTITATIVE PROPERTIES OF ANXIETY 393 


after the shock, which were not present in Fig. 1 as the result of the 
shock alone. Especially in Curves 4 and B of Fig. 2, the shock is 
seen to be followed by a depression and irregularity of rate which are 
at least much greater than any effect in the control records. With 
continued repetition of the experiment, this disturbance tends to 
adapt out, although not completely. In Curves C and D of Fig. 2, 


RESPONSES k—100—- 


TIME (EACH RECORD ONE HOUR) 


Fic. 2. Reduction in rate of responding during successive periods of anxiety. Averaged 
curves for six rats on four consecutive days. By the third or fourth day responding practically 
ceases during the presentation of the tone. 


the distortion is much less marked. Curve B of Fig. 4 gives a similar 
example at a relatively late stage of conditioning. 

The modification in behavior correlated with the anticipation of a 
disturbing stimulus cannot be attributed to a negative reinforcement 
of the response to the lever, since the shock was always given inde- 
pendently of the rat’s behavior with respect to the lever. Only 
upon rare occasions could the shock have coincided with a response. 
This was especiaily true in the experiments upon the group at a lower 
drive, where a similar effect was obtained. Figure 3 shows averaged 
curves for a group of six rats which had been subjected to the pro- 
cedure just described except that their drive was so low during condi- 


\ 


ag 


vy 


7 
a 
Ay 
J 
) 
~ 
a 
Bes 


394 W. K. ESTES AND B. F. SKINNER 


tioning that the rate of responding was virtually zero. The lower 
curve in Fig. 3 is for the first day on which the five-minute rather than 
the three-minute period was given. Up to and including this record, 
no effect of the anticipation of the shock could be detected, since the 
animals were not responding at a significant rate. The drive was 
then raised, and the upper curve in Fig. 3 shows the performance of 
the same group on the following day. By sighting along the curve, 
one may observe a marked depression in the rate of responding during 


RESPONSES 


a T----S 
TIME (EACH RECORD ONE HouR) 


Fic. 3. Reduction in rate during anxiety following experiments at a very low drive. The 
lower record is a curve for six rats at a very low drive but otherwise comparable with the curves 
in Figs. 1 and 2. The upper curve is for the same group at a higher drive on the following day. 
The tone has an obvious effect, although all previous presentations have been made at a drive 
so low that no effect was observable. 


the period of the tone. Comparison with Curve B in Fig. 2 shows 
that although the base line at the higher drive is more irregular, a 
depression of relatively the same magnitude is obtained. In this 
case, coincidental presentations of shock and response may safely 
be ignored, yet the tone has acquired the same depressing effect upon 
the behavior. | 

, Another characteristic which deserves attention is the compensa- 
tory increase in periodic rate following the period of depression. 
This appears to some extent in all records obtained; but it may be 
seen most clearly in Curve B of Fig. 4, a periodic curve for all 24 
rats after the emotional conditioning was quite complete. The 
curve was obtained about two weeks after the records in Fig. 2. 


fw 


SOME QUANTITATIVE PROPERTIES OF ANXIETY 395 


Curves 4 and C are controls taken (at a slightly higher drive) on 
adjacent days. By sighting along Curve B, one may observe a clear- 
cut increase in rate subsequent to the shock, which continues until 
the extrapolation of the curve preceding the break is reached. Evi-) 


RESPONSES 


TIME (EACH RECORD ONE HOUR) 


Fic. 4. Subsequent compensation for the reduction in rate during anxiety. The curves 
are averages for twenty-four rats taken on three consecutive days. 4 and C were taken under 
periodic reinforcement, while B shows the effect of the tone at a late stage in the experiment. 
The reduction in rate is followed by a compensatory increase, bringing the curve back to the ex- 
trapolation of the first part. 


dently*the effect of the emotional state is a temporary depression of 
the strength of the behavior, the total amount of responding during 
the experimental period (the ‘reserve’) remaining the same. Similar 
compensatory increases have been described under a number of 
circumstances, including physical restraint of the response (3). 


Errects oF ANXIETY UPON EXTINCTION 


When reinforcement with food is withheld, the rat continues to 
respond, but with a declining rate, and describes the typical extinc- 


od 
> 
~ 
ris 
\ 
ew 
pix 
Se 
4 


396 W. K. ESTES AND B. F. SKINNER 


tion curve. The effects of anxiety upon this curve have been in- 
vestigated. The first hour of a typical extinction curve, during 
which the combination of tone and shock was presented, is shown in 
the group curves of Fig. 5 and the individual curves of Fig. 6. By 
sighting along either curve in Fig. 5, one may observe a distinct 
depression in rate during the period of the tone, and (following the 
shock) an equally distinct compensatory increase, which appears 
to be maintained until an extrapolation to the first part of the curve is 
approximated. Figure 6 contains sample records from four rats 
which showed different degrees of depression during the tone. 


RESPONSES 


TIME (EACH RECORD ONE HOUR) 


Fic. 5. Effects of anxiety upon extinction. The lower curve is an average for six rats, the 
upper for twelve. The tone, which had previously been followed by shock during periodic 
reinforcement, depresses the slope of the extinction curves, and a compensatory increase follows 
the administration of the shock. 


During extinction, then, a state of anxiety produces a decrease in 
the rate of responding and the terminating stimulus is followed by 
such a compensatory increase in rate that the final height of the curve 
is probably not modified. 


THe EXTINCTION OF A STATE OF ANXIETY 


A further property of anxiety was investigated by presenting the 
tone for a prolonged period without the terminal shock. In one 
experiment, while the rats were responding under periodic reinforce- 
ment, the tone was turned on after twenty-seven minutes of the 
experimental period had elapsed and allowed to sound for the re- 
mainder of the hour. The result is shown in Figs. 7 and 8. It will 
be observed that the recovery of a normal periodic rate is delayed 


4 


ne 


397 


*$ ul poquasaidas 


(YNOH 3NO HOVS) AWIL 


SOME QUANTITATIVE PROPERTIES OF ANXIETY 


SASNOdS3Y 


|| 
1g 
in 
ct 
he | 
rs 
1S 
ts 
Q 
| 
ne ‘ 
1c we 
in a 
ye 
1e 
1e 
e- 
1e 
e- 
il] 


398 W. K. ESTES AND B. F. SKINNER 


considerably over the accustomed five-minute period of the tone. 
When the time is taken from the onset of the tone to the point at 
which the rat again reaches his previous periodic rate (measurements 
being made on individual curves), the mean period required for re- 
covery is found to be 8.6 minutes. The group curve for twelve 
rats (the upper record in Fig. 7) shows a definite compensatory in- 
crease in rate later in the hour, although the extrapolation of the first 
part of the curve is not quite reached by the end of the period. 


RESPONSES 


TIME (EACH RECORD ONE HOUR) 


Fic. 7. Extinction of the effect of a tone when the terminating shock fails to appear. The 
upper record is the averaged curve for twelve rats under periodic reinforcement. The tone was 
turned on at T and continued to sound during the rest of the hour. No shock was given. The 
rate of responding returns to normal (and perhaps shows some compensatory increase) within 
ten minutes. The lower curve shows a repetition of the experiment ten days later. 


The same experiment was repeated ten days later at a somewhat 
lower drive with the result shown in the lower curve in Fig. 7. The 
mean delay in recovery is here 9.1 minutes, and recovery is less com- 
plete. Except for the effects of the difference in motivation, the 
two records appear quite similar and exemplify the reproducibility of 
behavior of this sort. 

Because the period of depressed activity varies among rats, indi- 
vidual records are needed in order to observe the course of the re- 
covery of normal strength during the extinction of anxiety. Figure 
8 shows a number of individual records with different periodic rates, 
the differences being attributable mainly to differences in hunger. 
The lag in recovery appears in nearly all records, and the compensa- 
tory increase in periodic rate in the majority. In some curves, 
notably £, F, and G, an extrapolation of the first part is reached 
before the end of the hour. It is not clear that this would have been 


SOME QUANTITATIVE PROPERTIES OF ANXIETY 399 


TIME (EACH RECORD ONE HOUR) 


Fic. 8. Individual curves from the experiment described in Fig. 7. 


the case with the other rats if the experimental period could have 
been prolonged, but the curves in general appear to be positively 
accelerated. | 

Spontaneous recovery from the extinction of the anxiety is fairly 
complete. The daily record which preceded the upper figure in Fig. 7 
showed a ratio of 0.6 : 1.0 between the average periodic rate during 
the period of the tone and the normal rate for such an interval. 


Fr 
e, 
it 
S 
re 
1- 
st 
he 
fas 
he 
Lin 
at 
1€ 
n- 
1€ 
i- 
re 
r. 
a- 
on 


400 W. K. ESTES AND B. F. SKINNER 


On the day following the figure, the ratio was 0.7 : 1.0 for a similar 
period, indicating that little or no effect of extinction survived. 


SUMMARY 


Anxiety is here defined as an emotional state arising in response 
to some current stimulus which in the past has been followed by a 
disturbing stimulus. The magnitude of the state is measured by its 
effect upon the strength of hunger-motivated behavior, in this case 
the rate with which rats pressed a lever under periodic reinforcement 
with food. Repeated presentations of a tone terminated by an 
electric shock produced a state of anxiety in response to the tone, the 
primary index being a reduction in strength of the hunger-motivated 
behavior during the period of the tone. When the shock was thus 
preceded by a period of anxiety it produced a much more extensive 
disturbance in behavior than an ‘unanticipated’ shock. The de- 
pression of the rate of responding during anxiety was characteristically 
followed by a compensatory increase in rate. 

During experimental extinction of the response to the lever the 
tone produced a decrease in the rate of responding, and the terminat- 
ing shock was followed by a compensatory increase in rate which 
probably restored the original projected height of the extinction 
curve. 

The conditioned anxiety state was extinguished when the tone 
was presented for a prolonged period without the terminating shock. 
Spontaneous recovery from this extinction was nearly complete on 
the following day. | 


(Manuscript received April 18, 1941) 


REFERENCES 


1. Heron, W. T., & Skinner, B. F., An apparatus for the study of animal behavior, Psychol. 
Rec., 1939, 3, 166-176. 

2. Mowrer, O. H., Preparatory set (expectancy)—some methods of measurement, Psychol. 
Monogr., 1940, 52. Pp. 43. 

3. Skinner, B. F., The behavior of organisms: an experimental analysis, New York: Appleton- 
Century, 1938. 


PERCEPTUAL PRINCIPLES INVOLVED IN THE DISIN- 
TEGRATION OF A CONFIGURATION FORMED 
IN PREDICTING THE OCCURRENCE OF 
PATTERNS SELECTED BY CHANCE 


BY G. K. YACORZYNSKI 


Northwestern University Medical School 


Experimental work on perception from the gestalt viewpoint 
has been concerned with the development of laws of organization of 
stimuli into perceptual configurations. Not only do individuals 
tend to respond to external stimulus situations in terms of configura- 
tions which obey certain laws of perception, but also they tend to show 
preference for predictable configurations if they are required to 
anticipate the outcome of stimulus patterns selected by chance in a 
temporal order. Goodfellow (3) from an analysis of the data of the 
Zenith radio experiment on telepathy concluded that the general 
population shows a preponderant preference for non-symmetrical 
patterns if the individuals are asked to name the outcome of a pattern 
obtained by selecting by chance one of two objects a number of times. 
He (4) has shown that the preference for non-symmetry in predicting 
the outcome of patterns is a general principle which is operative in 
many other situations besides that of the Zenith experiment such as 
in true and false tests and experiments in psychophysics. 

For a long time it has been known that psychotics, especially those 
classified as schizophrenics, tend to form perceptual configurations 
which are different from normal. In most of these experiments the 
patient has been instructed to interpret an external stimulus situa- 
tion, such as the Rorschach figures, or to reproduce a stimulus pattern, 
such as the figures used by Wertheimer. The complexity of some 
of these situations which have been used makes it impossible to state 
what perceptual principles are involved in the configurations formed 
by the normal individual, and, therefore, the principles of organiza- 
tion which are affected in the psychotic remain unknown.’ 

The value of testing whether psychotics differ from normal indi- 
viduals in their preference for selecting one of two stimuli chosen by 
chance in a temporal order lies in the fact that the principle which is 


1 From the Department of Nervous and Mental Diseases. Part of a report presented to the 
Illinois Psychiatric Society, Mar. 7, 1940. 

2It is to be noted that although Bender (1) finds that psychotics reproduce the Wertheimer 
figures differently from normal individuals, the perceptual disorganization of the psychotics does 
not involve the gestalt laws of perception which these figures are purported to demonstrate. 


401 


: 
me 
4 
7 
~ 
4 


402 G. K. YACORZYNSKI 


used by the normal individual to organize these stimuli is known. If 
the psychotics differ from the normals it will be possible to state the 
exact perceptual process which is affected. In addition, the results 
which are to be reported show that the psychotics substitute a 
different principle of organization from the normal principle of 
preference for non-symmetry. Knowledge of the principles used by 
the psychotics in organizing stimuli should yield a better understand- 
ing of the manner in which the normal perceptual processes function 
and disintegrate. 


METHODS AND SUBJECTS 


To compare the results of this experiment with the results obtained from the Zenith radio 
broadcast some of their methods were used with the modification that the subjects were tested 
individually. Four sets of stimuli used by the Zenith radio broadcast, heads and tails (pennies), 
circle and square (cut out of cardboard), circle and cross (cut out of cardboard), and black and 
white (marbles), were used in this experiment. The subject was told that this was an experiment 
in his ability to guess the results obtained by a chance selection of one of two objects. The object 
thus selected by the experimenter was, of course, unknown to the subject and he was not told 
whether his guess was correct. To further approximate the Zenith study five trials were given 
for each of the four sets of objects. 

This procedure was used with 40 controls, 20 manic-depressives, and 20 schizophrenics.* 
The average ages of these groups in the above order were 35.6 (range 16-59), 40.1 (range 19-60), 
and 30.4 (range 18-50) years. Their educational training expressed as the total number of 
grades finished in grade school, high school, and college were 8.2 (range 4-14), 9.2 (range 0-15), 
and 11.0 (range 8-16) grades. The sexes were evenly distributed in the control group. The 
manic-depressives consisted of § males and 15 females, and the schizophrenics of 6 males and 
14 females. 


RESULTS 


Table 1 shows the percent distribution on the 16 patterns (the 
total number which can possibly occur by selecting one of two things 
five times) of the population studied by Goodfellow,* and the sub- 
jects used in this study. In the table the number of symmetries as 
defined by Goodfellow for each of the sixteen patterns are also given. 
For example, the third pattern, 11221, which would be obtained if the 
subject said heads, heads, tails, tails, heads, or vice versa, would have 
three symmetries as follows: I1, 22, and 1221. The fifth pattern, 
12112, would also have three symmetries, I1, 121, and 2112, plus an 
additional symmetry since the third number in the series is the same 
as the first. This last additional symmetry, 1.¢., the first and third 
numbers in the series being the same, was derived from some of 
Goodfellow’s work on psychophysical comparisons. 

3 Appreciation is expressed to Dr. C. A. Neymann, Chief of Staff at the Cook County Psycho- 
pathic Hospital, for collaborating on this project by selecting the psychotic cases. The controls 


were selected from the Northwestern University Medical School clinic on the basis of absence of 
athology. 

a Goodfellow’s (3) data presented in Table 1 were compiled from Table 3 published by him. 

Since only four sets of stimuli, heads and tails, circle and square, circle and cross, and black and 

white, were used in this study, only the averages of the percent distribution on these four sets of 

objects of the population studied by Goodfellow are given in Table 1. 


: 


PERCEPTUAL PRINCIPLES INVOLVED IN DISINTEGRATION 403 


TABLE I 


SHOWING THE PATTERNS, THEIR SYMMETRIES, THE PERCENT DISTRIBUTION OF THE GENERAL 
PopuUuLATION REPORTED BY GOODFELLOW, AND THE PERCENT DIsTRIBUTION OF THE Con- 
TROLS, Manic-DEPRESSIVES, AND SCHIZOPHRENICS Usep IN THE PRESENT STUDY 


No. of &% General % © Manic- © Schizo- 

Patterns Symmetries Controls Depressives 
11212 3 | 15.3 15.0 7.5 8.8 
12212 3 12.1 8.8 13.8 7.5 
11221 3 11.6 7.8 7.5 6.2 
12211 3 10.7 13.1 8.8 3.8 
12112 4 9.0 6.2 10.0 7.5 
12221 4 6.3 8.1 1.2 5.0 
12122 4 5.7 5.0 3.8 3.8 
11222 4 5-0 5.0 0.0 1.2 
11211 4 4.8 10.6 3.8 11.2 
11122 5 4:3 5.6 6.2 3.8 
12121 5 4.3 6.2 17.5 18.8 
I1121 5 3.9 5.0 5.0 7.5 
I2111 5 3.1 1.2 5.0 5.0 
12222 6 1.9 1.2 1.2 1.2 
II112 7 1.3 0.0 3.8 2.5 
IIIII II 0.7 1.2 5.0 6.2 


In order to see what relationships exist between the patterns 
preferred by the different groups of subjects, correlation coefficients 
were computed between the frequencies with which the 16 patterns 
were used. In Table 2 it is seen that the controls used in this study 
showed a preference for patterns similar to the patterns selected by 
the population studied by Goodfellow, the probability that such a 


TABLE 2 


SHOWING THE CORRELATIONS OF THE PATTERNS PREFERRED BY THE GENERAL POPULATION 
REPORTED BY GOODFELLOW, AND THE SuBjects Usep 1n Tuis Stupy 


Controls Manic-Depressives Schizophrenics 
General Population ..... .83* 42 16 
Controls............... .81* 32 33 
Manic-Depressives...... 


*P 


correlation could occur by chance is less than 1 in a 100 for 16 items 
(cf. 5). The preference of the two groups of normal subjects, how- 
ever, differs from that of the manic-depressives and the schizo- 
phrenics, since although the correlations between the patterns se- 
lected by the normals and the psychotics are positive they are not 
statistically significant. The manic-depressives and the schizo- 
phrenics, however, show similar preferences for patterns as can be 


on 
a 
| 
ar? 
i 
& 
‘4, 
| 

| 

4 


404 G. K. YACORZYNSKI 


seen from the highly significant correlation coefficient. These re- 
sults were independent of sex, age, or education. 

The correlation of .81 shown in the table between the controls 
used in this experiment gives the reliability obtained by dividing the 
controls at random into half and correlating the frequency with 
which the patterns were used. 


DIscussION 


The similarity between the general population and the controls 
used in this experiment in predicting the patterns which will occur 
from a chance selection of one of two objects made five times in a 
temporal order, shows that the same principle underlying the pre- 
diction was used by both groups of subjects. Since Goodfellow (3) 
has shown that under these conditions the general population shows a 
preference for non-symmetrical patterns, it is evident that the same 
principle was operative for the controls used in this study. 

The manic-depressives and the schizophrenics, however, do not 
predict the occurrence of similar patterns to that of the normal 
individuals. The atypical patterning of the responses of the psy- 
chotics might be due to either one of two reasons. In the first place, 
it might mean that the normal preference for non-symmetry has 
simply disintegrated in the psychotics, or, in the second place, it 
might mean that the psychotics use a different principle of organiza- 
tion from the normals which replaces the principle of preference 
for non-symmetry. If the former condition, that in the psychotics 
there is simply a deterioration of the normal integrative process, is 
true then only chance would govern the kinds of patterns selected 
by them. If, on the other hand, a definite principle of selection is 
substituted for the normal principle of non-symmetry then prefer- 
ence for certain patterns should exist for the psychotics. The fact 
that there is a significant correlation between the patterns selected 
by the manic-depressives and the schizophrenics indicates that some 
principle of organization was operative in the predictions made by 
the psychotics which replaced the normal principle of preference for 
non-symmetrical patterns. 

The most preferred pattern of the psychotics was the 12121 
sequence. ‘This is one of the less preferred patterns of the normals. 
In fact Goodfellow (4) referring to some of his more recent unpub- 
lished work states, ‘‘that the arrangement ABABA is avoided by 
most people . . . although the tendency to change a response due 
to one’s previous response should make this sequence popular.” 

An explanation for the result that the pattern 12121 is most 
preferred by psychotics, but least preferred by normals, is suggested 


an 


PERCEPTUAL PRINCIPLES INVOLVED IN DISINTEGRATION 405 


by Fernberger’s (2) work on psychophysics with lifted weights. 
He showed that individuals tend to reverse their successive judg- 
ments, which in effect means that they show a preference for the 
12121 sequence. His data appear to be in direct contradiction to the 
results reported by Goodfellow and the results of the normal subjects 
used in this study. The discrepancy, however, can be explained 
by the fact that there was no necessity for Fernberger’s subjects to 
inhibit a natural sequence of responses in order to match their results 
with what might occur when objects are selected by chance. None 
of Fernberger’s subjects were aware of the fact that their responses 
tended to follow a definite pattern, and the same pattern persisted 
even though the subjects were informed as to the nature of their 
responses. This spontaneity of response in the absence of external 
inhibitory conditions and without awareness or the ability to inhibit 
the response on the part of the subject argues for the conclusion that 
the 12121 sequence is a natural configuration which will occur when 
there are no external conditions to modify it. If the individuals are 
asked to predict the outcome of a pattern which is to be selected by 
chance, their spontaneous tendency to respond will be modified in an 
attempt to guess the nature of the pattern which will be thus selected. 
Evidently logic dictates that a pattern which is selected by chance 
will not be symmetrical. The normal subjects, therefore, predict 
that the outcome will be a non-symmetrical pattern. The preference 
of the psychotics for the 12121 pattern would mean, if the above 
interpretations are correct, that the perceptual processes of the 
psychotics remain relatively uninfluenced by the fact that they are to 
predict the outcome of a chance selection. Even in the presence of 
these restricting conditions they respond with the more basic patterns 
which are used by the normals when the additional influence of 
attempting to match one’s responses with what will occur by chance 
is removed. 


SUMMARY 


Controls, manic-depressives, and schizophrenics were required to 
predict the pattern which would occur when one of two objects was 
selected by chance five times. It had been previously shown that 
under similar conditions the general population shows a preference for 
non-symmetrical patterns. 

The controls showed a preference for the same patterns as the 
general population. The two psychotic groups differed from the 
controls or the general population in the patterns which they selected 
but resembled one another. 

It is concluded that the controls showed a preference for non- 
symmetrical patterns, and that the psychotics used a different 


a 


406 G. K. YACORZYNSKI 


principle of selection than the controls which was substituted for the 
principle of non-symmetry. 

‘The most preferred pattern of the psychotics is the 12121 sequence 
which is among the less preferred patterns of the normals. Since this 
sequence is selected by normal individuals when they are not re- 
quired to predict the pattern which will occur from chance selection, 
it is suggested that the psychotics rather than modifying their 
responses to conform to the special conditions demanded by the 
instructions resort to more basic patterns which are used by normals 
in the absence of any inhibitory conditions. 


(Manuscript received April 28, 1941) 


REFERENCES 


1. Benper, L., Principles of gestalt in copied form in mentally defective and schizophrenic 
persons, Arch. Neurol. Psychiat., Chicago, 1932, 27, 661-686. 

2. FERNBERGER, S. W., Interdependence of judgments within the series for the method of constant 
stimuli, J. exp. Psychol., 1920, 3, 126-150. 

3. GoopFELLow, L. D., A psychological interpretation of the results of the Zenith radio experi- 

ments in telepathy, J. exp. Psychol., 1938, 23, 601-632. 
. GoopFELtow, L. D., The human element in probability, J. gen. Psychol., 1940, 23, 201-205. 
. Linpquist, E. F., Statistical Analysis in Educational Research, Houghton Mifflin Co., 1940. 


wn 


/ 
) 


A NOTE ON NON-INFORMATIVE SHOCK 


BY JACK BERNARD 


Several studies have been reported in recent years concerning the 
effect of non-informative punishment for error, and non-informative 
reward for correct response, upon learning. Working independently, 
three workers almost simultaneously reported studies in this field. 
Eisenson (3), in a multiple-choice situation with human subjects, 
found that non-informative reward, while less effective than informa- 
tive reward in stimulating the repetition of correct responses, still had 
a decided positive effect. In the same experiment he found informa- 
tive and non-informative punishment for errors to be about equally 
effective. Bunch (2) reported data from a stylus maze experiment 
which indicated that the effectiveness of shock for error is only partly 
attributable to its informative value. ‘That effectiveness over and 
above its informative value she called the ‘punishment’ value. 
Gilbert (4), also working with a stylus maze problem, drew the 
cautious conclusion that “electric shock may, under certain condi- 
tions at least, have a facilitating effect upon learning apart from its 
guidance value, and that there is evidence (although less positive) 
that learning motivated in this way is followed by better retention 
than that which follows less highly motivated learning.” He later 
confirmed this finding in a punchboard maze situation (5). 

During the progress of a study reported in a recent paper (1), 
the writer and Gilbert ran a group of 24 subjects who learned the 
stylus maze employed in that experiment without shock for error. 
This additional group was intended to throw further light upon the 
problems of non-informative shock, was not in any way concerned 
with the main study of shock specificity, and was therefore not 
discussed in the report on that experiment. ‘The main experimental 
group received shock in connection with certain of the culs-de-sac, 
and no shock in others. The control group was run on the hypothesis 
that a comparison between the rate of elimination of non-shock errors 
in the main group and the rate of elimination of the same errors in the 
non-shock group would be equivalent to a comparison between a 
non-informative shock group and a non-shock group, since the main 
group was receiving shocks which gave no information as to non-shock 
errors. If it were found that non-shock errors in the main group 
were eliminated more rapidly than the same errors in the control 
group, it would constitute additional evidence as to the effectiveness 
of non-informative shock in learning. ‘The following is a report 
on this study. 

407 


4 
> 
= 
a 
> 
J > 
a 
q 
7 


408 JACK BERNARD 


Supjects, APPARATUS AND PROCEDURE 


Subjects.—The subjects were the 52 undergraduate students of the main experimental group 
described in an earlier paper (1), which group will henceforth be referred to as the experimental 
group, and an additional control group of 24 students who were in all respects similar to the first, 
The experimental group was divided equally into two sub-groups, designated Shock ‘O’ and 
Shock ‘X’ respectively. 

A pparatus.—The stylus maze employed was the Fox maze of the main study, details of which, 
together with the electrical set-up for administering shock to the experimental group, are de- 
scribed in a previous paper (1). 

Procedure.—Alternate culs-de-sac in the maze were designated ‘O’ and ‘X.’ The Shock ‘(’ 
sub-group in the experimental group received shock upon contacting the ends of ‘O’ culs-de-sac, 
and no shock in ‘X’ culs. The procedure was reversed with Shock ‘X.’ Only contacts with the 
ends of blinds were counted as errors, and retracing was prevented by means of a verbal signal. 
Full details of the procedure followed with the experimental group are given in an earlier paper (1). 

The control group learned the same maze under the same conditions as the experimental 
group, but without shock. A statement was made to each subject at the beginning of the experi- 
ment that no shock would be employed, and that he was simply to learn the maze pattern to our 
criterion (two out of three trials without error) in as few trials as possible. This statement was 
made necessary by the presence of electrical apparatus used for administering shock to the 
experimental group. 

Only twenty trials were given each subject, although all believed that they were to work 
until the criterion of learning was met. The records of subjects who met our criterion within 
twenty trials, or failed to complete twenty trials in the alloted time of 1 hour, were discarded to 
avoid weighting the last few trials, when their data would drop out. In this manner five subjects 
out of an original fifty-seven were lost in the experimental, and three out of twenty-seven in the 
control group. 


RESULTS 


The criteria used in comparing the performance of the experi- 
mental and control group were: 


(1) The total number of non-shock errors made during the course of 
learning. | 

(2) The total number of non-shock errors made on ‘ X” blinds. 

(3) The total number of non-shock errors made on ‘QO’ blinds. 

(4) The ratio of all non-shock errors during the first 10 trials to all 
non-shock errors during the second 10 trials. This error 
ratio is a measure of progress in learning, and is explained in 
more detail in a previous paper (1). 

(5) The error ratio (as defined above) of non-shock ‘X”’ blinds. 

(6) The error ratio of non-shock ‘O”’ blinds. 


The results of these comparisons are presented in Table I. It 
will be noted that no reliable differences are revealed. 

The simple ‘error-trial’ learning curves presented in Figs. 1 and 
2 confirm the statistical finding of ‘no reliable difference’ throughout 
their entire length. They show no tendency to separate beyond 
what is easily accounted for by chance fluctuation. 


4 


A NOTE ON NON-INFORMATIVE SHOCK 


TABLE I 


Experimental Control | 
Diff. 
Mean o Mean | o | 
| 
All non-shock errors.........} 98.2 30.0 95.5 24.8 71 
Non-shock ‘X’ errors........ | 104.7 33-3 99.8 26.8 4.9 .56 
Non-shock ‘O’ errors. ...... 91.7 24.3 87.3 21.0 4-4 67 
Ratio 1st 10 trials to 2nd 
10 trials 
All non-shock ratio...... 1.76 .79 1.64 | .46 12 83 
Non-shock ‘X’ errors... . 1.78 83 1.48 .50 .30 1.33 
Non-shock ‘O’ errors... . 1.73 .79 1.93 2 .20 95 
Discussion 


The findings just presented might be interpreted at first glance as 
throwing doubt upon the efficacy of non-informative shock as an 
aid or incentive in learning. Certainly if our opening hypothesis is 
correct, that shock for one group of errors in a maze situation and 


8.0¢ 


7.0F 


6.0 


4.0} 


AV.ERRORS 


2.0 


1.0 


0.0 > 


Fic. 1. 


4 


EXPERIMENTAL 
CONTROL 


10 


TRIALS 


6 


ig 20 


Comparison of the elimination of non-shock ‘X’ errors in the 
experimental and control groups. 


/ 
4 = 
we 
409 
a: 
Yoh, a 
» 6 8 1 
| 


410 JACK BERNARD 


EXPERIMENTAL 


CONTROL 


fe) 


AV.ERRORS 


i920 


6 8 10 12 4 £6 


TRIALS 


Fic. 2. Comparison of the elimination of non-shock ‘O’ errors in the 
experimental and control groups. 


not for a second group within the same maze is equivalent to non- 
informative shock for the second group, this conclusion is unavoid- 
able. It would, however, be contradictory to all previous findings 
relative to the role of non-informative shock. We must then subject 
our opening hypothesis to a more careful scrutiny. 

Upon consideration, one fact becomes immediately apparent. 
The shock administered to our experimental group was not, strictly 
speaking, non-informative. Each shock was given in connection 
with some error, even though the shock errors are not considered in 
the present report, and the subjects were aware of this significance 
of the shock. Non-informative shock, as used by Gilbert (4, 5), is 
not given in direct connection with any error. When shocks are 
administered in proportion to the number of errors made, and yet 
not in direct connection with those errors, and the subjects are cog- 
nizant of this (as were Gilbert’s subjects), the effect is likely to take 
the form of a more cautious attitude toward the situation in general 
and the significant cues within it. When shocks are given in con- 


8.0 
| 70b 
40 
3. 
| 2.0 
| | 
: 0.0b—. 
| 2 4 
| 


A NOTE ON NON-INFORMATIVE SHOCK 411 


nection with certain errors and not for others, it has been shown (1) 
with the maze pattern here used that the subjects tend to eliminate 
the shock errors more rapidly than the non-shock errors. Thus 
in this case the shock tends to concentrate the attention of the sub- 
ject upon the elimination of the shocked errors, rather than simply 
inducing a general incentive attitude as does non-informative shock. 

There are at least two possible explanations for the fact that non- 
shock errors in our experimental group were eliminated at a rate 
that is not reliably different from the rate of elimination of the same 
errors in our control group. One might be that the effect of shock 
in the experimental group was entirely specific to the shock errors, 
and had neither a positive nor negative effect upon the total situa- 
tion and/or the non-shock errors. ‘The other would be that the shock 
in the experimental group induced the general incentive attitude 
which alone (as it is in non-informative shock) would tend to produce 
a more rapid elimination of errors, but that this positive effect was 
counteracted by the fact that the shock given in connection with 
certain errors and not with others tended to concentrate the attention 
of the subjects upon the elimination of the shock errors to the neglect 
of the others. Thus in this explanation both positive and negative 
effects on the elimination of non-shock errors are present, but are 
approximately equal and tend to neutralize each other. In the light 
of observations made during the course of the experiment, the writer 
tends to support the second explanation. The presence of what 
Leuba (6) terms the ‘incentive attitude,’ with its accompanying 
increase in muscular tension, cautiousness, and vigor of movement, 
was quite apparent in the experimental group as compared with the 
control group. Since this ordinarily induces generally better per- 
formance when present, we must explain its failure to improve the 
elimination of non-shock errors in the present experiment by postu- 
lating an opposing influence. This appears to lie in the very ap- 
parent concern which most subjects showed in the elimination of 
their shock errors and relative unconcern about non-shock errors, 
despite the fact that in their preliminary instructions they had been 
told that all entries into culs-de-sac would count equally against 
them, whether shocked or not. 

The results of this study may also have some interest for those 
who are concerned with the effect of the spread of ‘reward,’ ‘ punish- 
ment,’ or ‘emphasis’ to neighboring connections. Here we found no 
apparent effect, either positive or negative. 


} 
SUMMARY 


In the present experiment an attempt was made to determine 
what, if any, effect shock for certain errors in a stylus maze situation 


ite, 
at 
ets 


‘ 
. 


412 JACK BERNARD 


would have upon the elimination of non-shock errors in the same 
maze. Fifty-two subjects, twenty-six male and twenty-six female, 
comprised the experimental group which ran twenty trials in the 
Fox maze. Learning was in no case complete. Possible differences 
in difficulty were controlled by having the shock blinds for half the 
group serve as the non-shock blinds for the other half. 

The performance of this group on the non-shock blinds was 
compared with that of a control group of twenty-four subjects, 
similar in all respects to the experimental group, but which ran the 
twenty trials without shock. Comparisons were made according to 
six criteria, none of which revealed a statistically reliable difference. 

This is interpreted as being due to the opposition of two effects 
of the shock: (1) The general incentive effect which has been demon- 
strated with non-informative shock, and (2) the tendency of punish- 
ment for error to cause the subject to concentrate upon the avoidance 
of the punished errors to the neglect of the non-punished errors, which 
is not found with non-informative shock. 


(Manuscript received April 30, 1941) 


REFERENCES 


1. Bernarp, J., & Girpert, R. W., The specificity of the effect of shock for error in maze 
learning with human subjects, J. exp. Psychol., 1941, 28, 178-186. 
2. Buncn, M. E., Certain effects of electric shock in learning a stylus maze, J. comp. Psychol., 
1935, 20, 211-242. 
3. E1senson, J. Confirmation and information in rewards and punishments, Arch. Psychol., 
1935, 27, No. 181. 
4. Gitpert, R. W., The effect of non-informative shock upon maze learning and retention with 
human subjects, J. exp. Psychol., 1936, 19, 456-466. 
. Gitsert, R. W., A further study of the effect of non-informative shock upon learning, J. exp. 
Psychol., 1937, 20, 396-407. 
6. Leusa, C. J., A preliminary analysis of the nature and effects of incentives, Psychol. Reo., 
1930, 37, 429-440. 


STUDIES IN THERMAL SENSITIVITY: 16. FURTHER 
EVIDENCE ON THE EFFECTS OF 
STIMULUS TEMPERATURE 


BY WILLIAM LEROY JENKINS 
Lehigh University 


When the same skin area is mapped for warmth alternately with 
two different temperatures (for example, 38° and 44°), the seriatim 
scores do not show a uniform shift. Some squares of the map 
remain unaltered in score; some change slightly; while some others 
change markedly. The behavior of any single square is not pre- 
dictable, although the general alterations appear to follow certain 


statistical principles. ‘These facts have been reported in a previous 
article.! 


To determine more accurately the effects of stimulus-temperature, 
it is desirable to have comparable results from a range of stimulus 


temperatures on the same map. ‘That has been the purpose of the 
present study. 


PROCEDURE 


The procedure in this investigation varied in several important respects from that in the 
study referred to above. 

1. Four temperatures (38°, 41°, 44° and 47°) were used in an irregular order, so that the 
subject could not know what temperature was being employed, thus eliminating any possibility 
of suggestion. 

2. The subject acted as his own recorder, entering his report numbers in a serial list. The 
experimenter followed a pre-arranged order of stimulation, which was later used to convert the 
serial list into the usual map form. 

3. Stimulation on succeeding rounds was alternated between two maps, to provide a longer 
period for recovery from adaptation. By this means, the interval between successive stimulations 
of the same square was increased to an average of 20 min., instead of 10. All of the maps were 
stamped on the volar surface on the left forearm, except for three cases where one map was on 
the forehead above the eyes. The maps on the forehead gave similar results to those on the arm. 

4. An 8-hour experimental period was used (1 to 9 p.m.) and the subjects remained in the 
laboratory during the entire time. They were given a 7 min. rest period after each hour of 
mapping, and half an hour for supper at 6 p.m. This extended period was necessary in order 
to accumulate sufficient comparable data. 

5. The basis for determining the seriatim scores was changed. Since the behavior of indi- 
vidual squares was a critical feature, it seemed desirable to institute a stricter criterion. Instead 
of adding up the six report numbers to get the seriatim score for each square, the scheme shown 
in Table 1 was employed. This is based on the clustering of reports about a certain report-level. 
When the reports in a given square failed to cluster, that square was marked ‘X’ (unscorable). 
Even a highly consistent subject may occasionally have a square that shows inconsistent results. 
By this method, such squares are automatically eliminated. 


1 Jenkins, W. L., Studies in thermal sensitivity: 15. Effects of stimulus-temperature in 
seriatim warm-mapping, J. exp. Psychol., 1941, 28, 517-523. 


413 


, 
~ 
4 
|| 
4 


414 WILLIAM LEROY JENKINS 


TABLE 1 
Score ‘o' Score ‘3’ 
6 § 4446443 3 8 8 
Score ‘6’ Score ‘9’ 
2 2 2 I I I I o- oF I I 
- £©@seset##e4ss § 4 - 4 4 3 3 2 2 1 
a 2@te-~ € - = - a 6 
Score ‘12 Score ‘15 
2 2 I I = I = = I I = I 
2a- 2 3 3 3 4 4 5 § § 6 
Score ‘18’ 
- ft - 
ti-- « 
$s © 


6. The basis for selecting consistent subjects was changed. Primarily, this involved the 
elimination of any subject who showed an excessive number of unscorable squares, even when 
the general standard (reliability coefficient of .80) had been attained. 


RESULTS 


In all 12 subjects, the results confirmed those of the previous 
study.2, With any two temperatures, some scores changed sharply, 
some slightly and others remained unaltered. ‘The statistical prin- 
ciples, however, are more clearly demonstrated; and the further 
analysis is concerned with these. 

The 7 most consistent subjects were selected on the basis just 
described. From their maps were discarded (a) all squares where an 
‘X’ appeared at any one of the four temperatures; (b) all cases of 
apparent reversals in the sequence (such as 6-9-6-6). There re- 
mained 202 squares which had perfectly consistent sequences; either 
the same score at all four temperatures, or an upward trend without 
reversals. These 202 squares were treated as a group for further 
analysis, since the data were insufficient for the study of individual 
differences. 

The distribution of seriatim scores in these squares at the four 
stimulus-temperatures is shown in Table 2. There is a general 
upward trend in the score levels, as might be expected. This con- 
ceals, however, a most striking fact: that is, the wide variety of 
patterns in the sequences of four scores. In the 202 squares, there 
were 77 different patterns. Thirty of these patterns occurred 3 or 


20>. cit. 


ra 

y* 


STUDIES IN THERMAL SENSITIVITY 415 


more times, but only three of them as many as 10 times. In the 
93 squares with a score of ‘6’ at 38°, there were 23 patterns, which 
comes close to exhausing the total possibilities. Nine of these pat- 
terns, furthermore, occurred 5 or more times. 


TABLE 2 


Seriatim Scores 


Stim. Temp. | 
| oO | 3 6 | 9 12 15 | 18 
38° 18 | 36 93 40 12 2 | I 
aa* 9 II 73 56 31 20 «| 2 
44° I 14 37 57 52 32 9 
47 I 9 20 35 52 54 31 


In the bar at the bottom of Fig. 1, these patterns are analyzed in 
another way. ‘This shows that the number of cases where the score 
was the same at all four temperatures is approximately equal to the 
number where there were four different scores. Similarly, the per- 
cent where one shift in score occurred (3 alike or 2 pairs) about 
matches that where two shifts occurred (one pair). 


+3° +6° +9° 
0 
3 3 3 Ea 
6 6 ] 
9 g 
12 12 i2 J 
15 
DETAIL FOR SCORE OF 6 
38°-44° 
41-44° Increased 3 
44-47° 41°-47° C] Incr. 6 or more 
DISTRIBUTION OF PATTERNS 
All Threealike All 
some or two pairs One pair diff. 
Fic, 1. 


The upper part of Fig. 1 shows the statistical predictability for 
specific score changes. At the left of each bar is the initial score. 
The shading shows what happens when the stimulus-temperature is 
increased 3°, 6° and 9°. Regardless of the original score and the 
original stimulating temperature, it appears: (a) a temperature in- 


. 
we 
= 
xi 


4 
% 
f 
A 
| 
| 
) 
L 
3 
j 
| 
~ 
~ 
1 
i 
4, 


416 WILLIAM LEROY JENKINS 


crease of 3° leaves about half of the scores unchanged; (d) an increase 
of 6° leaves about one-fifth unchanged; (c) even a rise of 9° leaves 
about one-tenth at the original level. | 

In the detail for score ‘6,’ notice how slight a difference it makes 
whether the 3° increase is from 38° to 41°, from 41° to 44° or from 44° 
to 47°. Similarly, going from 38° to 44° produces about the same 
results as changing from 41° to 47°. In general, it seems that the 
level of the original score and the amount of temperature increase 
are the only critical factors. 


DIscussIoN 


What kind of a receptor system could account for such results? 
The traditional textbook statement is that there are one or two warm 
spots per sq. cm. on the forearm. On the average, then, a single 


TWO THEORIES OF GRADED-ACTION RECEPTORS 


CONDUCTION MULTIPLE SPOT 
THEORY MAP THEORY 
6/6;3 {0 
38° 38° 
12; 6] 3 
6/9 
41° 
12 | 9 
44° 44° 
9 | 12 
6/12; 6] 3 
47° 47° 
9115 
Fic. 2. 


receptor should be responsible for the sensitivity of half a dozen 3 
mm squares. The varying scores in such a group would be deter- 
mined by differing thermal conduction to the same receptor. In 
Fig. 2, the left-hand diagram shows how neatly such a scheme can 
be used to explain the map at 38°. Less heat is conducted to the 
receptor from the more distant squares, and the receptor is stimu- 


> 
4 
/ 
{ 
! 


\w 


STUDIES IN THERMAL SENSITIVITY 417 


lated less strongly. In fact, with sufficient ingenuity in assuming 
the location of receptors and the variations in thermal conduction, 
any one map can be thus explained. But when the stimulus tempera- 
ture is increased, it is obvious that this whole section of the map 
should rise in unison, since all the scores should benefit by the in- 
creased stimulation of the central receptor. Experimentally, this 
simply does not happen. No assumptions on the basis of this theory, 
however ingenious, can account for the irregular shifts which do 
actually occur. In short, the thermal-conduction theory cannot be 
correct. 


THE CONCENTRATION THEORY 


6|/6|3/0 
38° Be, 38° 
se ove 9,909 |} 12) 6 
$0080 
Oee e e 
44° o“e! 44 
4 | 12 9 | l2 
e 
6 are 
is |12| 9 | 15 
eee eee 
RECEPTORS SCORES 


To get around this difficulty, we may assume the kind of scheme 
shown on the right-hand side of Fig. 2. It is not necessary to postu- 
late a rigid system of one spot per square as shown here. Any similar 
arrangement of graded-action receptors would do. ‘The main point 
is that now each square is capable of operating independently. To 
account for the 77 different sequences of scores which are found 
experimentally, it is merely necessary to assume that receptors dis- 
play 77 different kinds of graded action. A few have a regular in- 
crease in activity proportional to the increase in temperature. 
Most of them move upward in a hop, skip and jump fashion. A few 
give the same response at all four temperatures. If the objection 
is raised that such an assumption is neither logical nor parsimonious, 


S 

=) 

> 

a 

~ 

> 

A 

at 

- 

= 
F 
IG. 3- 

~ 
J 
) ‘ 
2 
< 
he 
Fy 


418 WILLIAM LEROY JENKINS 


the answer is that this is the only way in which even a semblance of 
the traditional spot theory can be maintained. 

Figure 3 shows an explanation of the experimental results in 
terms of the concentration theory. The maps at the left represent the 
hypothetical distribution of receptors—black circles indicating 
active receptors, open circles those which are present but not yet 
active. ‘The middle column shows the same region with the squares 
of the map superimposed. The right-hand column indicates the 
seriatim scores. 

The concentration theory rests upon three simple assumptions: 

(1) That the receptors are all-or-none in action. The magnitude 
of their response may vary with their physiological condition, but 
not with the strength of the stimulus, once the limen is reached. 
Each receptor is not operating at all, or else it is delivering a fixed 
impulse-frequency which is characteristic of that receptor. 

(2) That the receptors differ in their limens. That is, some of 
them are set off at a very low level; others at higher levels; and some 
few only at extreme temperatures. 

(3) That the experienced intensity of warmth depends funda- 
mentally upon the concentration of active receptors, not necessarily 
in a linear relation. 

If the limens are distributed according to some statistical prin- 
ciple, we should expect the kind of results shown in Fig. 1. Each 
increase in stimulus-temperature, on the average, should call more 
receptors into play. In any individual square, however, the number 
of receptors would not be sufficient to follow the statistical law con- 
cerning the distribution of limens. Hence, a large number of differ- 
ent patterns of response might be expected. 

Actually, the neurological situation cannot possibly be as simple 
as shown in Fig. 3. Other experiments now in progress indicate that 
the receptors are connected in multiple; that is, a number to a single 
neuron. Then the total frequency of impulses along that neuron 
would depend, not only upon the number of active receptors, but also 
upon how well their timing dovetails and the limits set by the re- 


_fractory phase of the neuron. Going back into the central nervous 


system, the problems of the integration of these first-order neurons, 
and then the second-order, etc., are evidently far beyond the scope 
of our present knowledge. 

Histologically, only the free nerve endings are known to be present 
in sufficient abundance to be candidates for thermal receptors. 
Some of these may be chemically differentiated as warm receptors; 
others as cold receptors; and still others for pain and possibly for 
touch. Their apparently unspecialized character would fit the 
assumption of all-or-none action. Likewise, they appear to come in 


\ 


STUDIES IN THERMAL SENSITIVITY 419 


multiple from each nerve fiber; so it would be possible to have a 
relatively large number of receptors connected to a relatively small 
number of first-order neurons. 


SUMMARY 


In amplification of an earlier study, seriatim mapping was con- 
ducted with temperatures of 38°, 41°, 44° and 47° in random order to 
eliminate any effects of suggestion. The maps of the consistent sub- 
jects showed 77 different sequences of seriatim scores, with four differ- 
ent scores occurring in as small a number of cases as four alike. 
Statistically, with an increase of 3° half of the scores remained un- 
changed, with 6° one-fifth and with 9° one-tenth, regardless of the 
original score level or the original stimulating temperature. 

If the variations in a map were dependent upon differing thermal 
conduction to a few warm spots, whole sections of the map should 
rise in unison. This does not occur. If an independent warm spot 
for each square is assumed, it is necessary to postulate that warm 
receptors are capable of acting in 77 different ways to account for the 
different sequences obtained experimentally. 

The simplest explanation is the concentration theory: (1) Each 
receptor is all-or-none in action; (2) Receptors differ in their limens; 
(3) The seriatim score depends essentially upon the concentration of 
active receptors. 


(Manuscript received April 23, 1941) 


¥ 
were 
ae 
Si 
} 
ve 
| 
~ 


DISCUSSION 


THE VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 


BY BERNARD BABINGTON SMITH 
University of St. Andrews, Scotland 


In a recent note, (3), on ‘the validity of judgments as a function of the 
number of judges,’ Mr. Eysenck seems to have sown the seeds of mis- 
understanding. He claims to have shown that “‘a general factor of taste 
is the most important single determinant for individual aesthetic prefer- 
ences,” and continues, ‘“‘It follows that the criterion will simply be the 
average judgments of a large number of judges: chance errors and bipolar 
factors will balance out in the long run and leave the correct order, in the 
same way that in Dr. Gordon’s experiment (6) the average judgments of 
two hundred subjects agreed perfectly with the independent criterion, viz. 
the actual weights used.” 

If it is possible to show the existence of a general factor of esthetic 
taste, this is not sufficient warrant in itself for the statement that chance 
and bipolar factors will balance out in the long run. When Mr. Eysenck’s 
claim is re-stated in the form that “‘a measure shown to be reliable will in 
the long run be perfectly valid,” it is more easily seen to have over-reached 
itself. Examples are given below where there is a significant and even high 
degree of agreement between judges with respect to rankings the sum of 
which differs from the criterion. 

The first point to be made is that Mr. Eysenck’s formula deals with 
reliability and not with validity. The last paragraph of his note reads, 
“This is so because we may regard the theory underlying the formula for 
measuring the percentage of causal factors as an extension of the general 
factor theory on which this research is based.”’ It will not have been clear 
to the reader of Mr. Eysenck’s note that the formula he quotes, 


ntkk’ (i) 
1+ (n — 


as having been established by Burt in Marks of Examiners, (2), is for use 
only where the general factor theory holds and is then only approximately 
true. 

The fundamental formula is based on that derived in the first place by 
Spearman, and given in many textbooks, for the correlation between the 
sums or averages of two sets of measures. This formula may be written: 


Vm 
"kK = (ii) 
V(r + (m — 1)Fm)(1 + (M — 1)Fax) 
where there are two sets of measures Gn, Dy +++ Day An, 


ky Ra, m, Mn), and (Ay B, -++ Ba, An, 
420 


| 


VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 421 


K, +++ Kn, +++, Mi +++ Mn), m and M in number, where a bar over a sym- 
bol indicates an average, and where the standard deviations of the measures 
a,b +++ m, A, M, are all equal. 

From formula (ii) are derived, and are to be found in textbooks, the 
various formulae for reliability and validity. For instance, if all the 
measures are pooled and it is assumed that rx is constant for all values of 
h and k, (ii) becomes: 

mr 


1+ (m-—1)r 


If we set M = 1 or 7, = I, in other words, if we have a single measure, 
K --+ K, in one pool (or a perfectly reliable measure), (ii) becomes: 


(the Spearman-Brown formula). (iii) 


VI + (m — 
In Marks of Examiners, Professor Burt says (§ 594, p. 303), ‘‘To obtain 


a corresponding formula for the unweighted average we may use the familiar 
formula for correlation of sums 


(iv) 


VSo.2 TKK? 


where the summation in the denominator is taken over the whole of the 
n(n — 1) coefficients and k stands for the unweighted average. . . . If the 
standard deviations are approximately equal the last formula reduces to 


(vi) 
NT kg _ NF KK? 
Tkg Vn + n(n — 1)F ex (m — 


(v) 


Tkg 


since, as we have seen, fi, = Vines approximately.” 

I have been unable to trace the demonstration of this approximate 
relationship beyond sections 574, 575, p. 280, where it is shown that 
Tkk’ = TkoTk’g “If there is no specific influence common to the two exami- 
ners, then theoretically the correlation between & and k’ should be the 
product of their two respective correlations with the true marks”: in other 
words, if the conditions of the case satisfy the two factor theorem. 


Roughly speaking, then, we can see that = hence 
and f%, will be approximately equal. Mr. Eysenck, therefore, depends on 
an approximation and the two factor theory, when an exact algebraic rela- 
tionship is available, namely the formule first quoted by Burt ((2) v and vi 
above), and before showing that the two factor theory is applicable. 

If we use ranks, as has been the practice in the discussion of group 
judgments, if the standard deviations are all equal, and, in my notation, 
formula (v) becomes 

Vix 


TEK 


(viii) 


he's? 
he 
‘ 
Sour 
| kT k 
: 
| 
P 
P 
Yad 
rhe 
) 
a. 
4 
> 
> 
. 
’ 
re 


422 BERNARD BABINGTON SMITH 


the important point, which I shall illustrate, is that #« is not necessarily 
equal toV7,,. In that case Burt’s approximation fails and the two factor 
theorem with it. 

This formula (viii) for the correlation between one measure and the 
sum or unweighted average of m others has been given in a number of text- 
books. C. L. Hull (8) and others have pointed out that rix cannot exceed 


7 
(ix) 


Vine 


This can be expressed otherwise by saying that if #ix is less than VF ni 
rex Can never reach unity by increasing the number of judges (m), or again 
that if 7, is less than V?,4, the measures (or judges making them) must 
differ in some important way from the true ones. It is then convenient 
to say that the judges make systematic errors or are biased. 

If we apply formula (viii) to Gordon’s data (7), we find #:x = .401, but 
7x = .169. Here, in the example which led to Eysenck’s research, Burt’s 
approximation is not verified, for rex tends to .g77 as m, the number of 
judges, increases. The amount of bias or systematic error is admittedly 
not very great. 

Unfortunately the same treatment cannot be applied to the results 
reported by other workers such as Stroop (12), Preston (10), Bruce (1), 
and Mapheus Smith (11), since the sums of the original ranks were not 
published, nor given. 


EXAMPLE I 


The following data obtained by lifting weights under conditions some- 
what less exacting than Dr. Gordon’s confirm her results. 

Seventy-four students ranked nine weights, ranging from 28 to 36 grams 
with intervals of one gram. 

The sums of ranks allotted were: 


36 35 34 33 32 31 30 29 28 

150 205 270 315 389 381 481 559 580 
and for these fix = .736, fark = .545. Thus rix = .gg1 and a .997. 

Vink 

The evidence for bias is even less than in Gordon’s experiment. This 
is not surprising since the difference between successive weights was pro- 
portionally much greater. 

In the work referred to so far, a satisfactory criterion, weight, number, 
or denomination of cards, is available. It is by no means certain that such 
a satisfactory criterion can be found or even that it exists, in all cases. 
Again, it is usually agreed that weight is a satisfactory criterion, but this 
is surely so only when such qualities as size, density, or homogeneity of 
material are made uniform. One example to illustrate this point is given 
below. When we leave physical measures and investigate mental or 
esthetic qualities, the position is even less secure. The Binet test, how- 


Ye 
€ 


VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 423 


ever, is usually accepted as a satisfactory criterion of intelligence, and so a 
second example is given where it has been so used. 

In both examples there is to be found a significant measure of agreement 
between judges and considerable divergence from the criterion used. Thus 
if we pool the judgments we obtain a reliable but imperfectly valid result. 
It seems quite likely that if the number of judges is increased we shall 
obtain a more reliable measure in the sense that the average inter-correlation 
may rise, but it is difficult to see how we can maintain that the pooled 
judgments will become more valid, unless we suppose that the new judges 
correlate more highly with the criterion. If this is so, we should do even 
better by discarding all the old judges. 


EXAMPLE 2 


Nine boxes ranging in size from 18 to 1.5 cubic inches were weighted 
so that the lightest weighed 28 grams and the heaviest 36, the difference 
between the boxes being one gram. ‘Twenty subjects arranged the boxes 
in order of weight. Hence fx weight = .090, fre = .785, Tiw = .101, and 
Lt rew = .102, and fx size = —.819, fae = .785 (as before), ris = —.918, 
Lt rks = —.925. (See table below.) 


2 
Dimensions Rank by Weight by | Sum of Ranks 
Volume Weight Allotted 
A 6X 3X1 ins. 18 cu. I 31 gr. 6 177 
B 3-3X2.3X1.5 11.4 ins. 2 34 3 149 
* 4.3X3.0X0.8 10.3 3 30 7 134 
D 3.5X3.3X0.8 9.2 4 32 5 130 
E 1.6X 1.6XK 2.2 5.6 5 36 I 83 
F 1.2X0.9X 2.4 2.6 6 35 2 47 
G 2.3 X 1.5 X0.7 2.4 7 28 9 58 
H cylinder of 2.1 8 29 8 81 
diameter 1.2X 1.9 
I cylinder of 1.5 9 33 4 41 
diameter 0.8X 2.9 


Thus the evidence is that this group of subjects was seriously in error 
in its estimation of weight, which corresponded in fact closely to the inverse 
of size, or in other words, density. This is important, for we ordinarily 
accept weight as a valid criterion and that people will understand what we 
mean when we say weight. 

It may be said that this result is only an instance of the ‘size-weight 
illusion’; it is more useful to see it in its place as an example of the way in 
which human estimates may be affected by bias which does not disappear 
as the number of judges is increased. A number of such biasses or illusions 
have been identified in the field of psycho-physics. Little enough is known 
of the preconceived ideas, fashions, schools of thought, and prejudices, 


which correspond to them in more purely psychological fields, and in 
esthetics. 


os 


y 

r 

> 

1 


424 BERNARD BABINGTON SMITH 


EXAMPLE 3 


Judgment of intelligence from photographs. 
Material: Twelve photographs of boys (5). 
Criterion: Binet 1.Q. 

The judges, 130 university students, were instructed to rank the boys 
for intelligence from their photographs. They were told that their order 
would be later compared with the ‘Binet’ order. 

This example is worked out in full to illustrate the method. 


I 2 3 4 — _6 
Boy Sums of Ranks d (845) Binet Order d X Binet Order 
Allotted 
A 1,063 +218 47,524 3 +654 
B 822 — 23 529 5 — IS 
C 791 — 54 2,916 2 — 108 
D 686 —159 25,281 9 —1,431 
E 475 — 370 136,900 4 — 1,480 
F 1,236 +391 152,881 Io + 3,910 
G 592 —253 64,009 6 — 1,518 
H 980 +135 18,225 I + 135 
I 577 — 268 71,824 8 — 2,144 
f 872 + 27 729 12 + 324 
907 + 62 3,844 7 + 434 
L 1,139 +294 86,436 II + 3,234 
10,140 fe) 611,098 1,895 
wai at 1) =S = Six 


m = 12 1.¢. number of items. 
nm = 130 1.¢. number of judges. 
12S 
—m 
ni—n 12SkK 


thx = .203 and Lt rin = .205. 


= .102. 


To test the significance of fx (9): 


1 + (m — 
I — fre 


e** is calculated from the expression , 1.€. 43.6. 
This value is referred to Fisher’s Table V (4) with degrees of freedom 
nm, = (n — 1 — (2/m)) and nz = (m — 1)(n — 1 — (2/m)), that is, 11 and 
1417 approximately. As the .1 percent point for m = 12 and nm, = © is 
2.47, there can be no doubt of the significance of the value .247. 

7x is the average of 130 separate rank correlation coefficients and there- 


fore its standard i : = .026 
ore its standard error is === = .026. 


— 1 

?xk = .102, and being nearly four times its standard error, is significant 
by usual standards. 

The evidence of these data is that these judgments based on appearance 
in photographs are systematically different from and irreconcilable with, 
but significantly related to whatever is measured by the Binet test. The 


| 
\ 


7S 


VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 425 


use of the Binet I.Q. as a criterion of intelligence is open to criticism, but it 
is as good a criterion as has yet been proposed. Both examples, 2 and 3, 
support the view that in human estimates, errors in general do not bal- 
ance out. 

When we proceed to esthetic judgments, the existence of a satisfactory 
criterion seems even more questionable. In point of fact, it makes very 
little difference whether or not we suppose there to be such a thing as a 
satisfactory esthetic criterion. If we suppose that there is not, then 7x 
has no meaning, and discussion is restricted to 7,,; in other words, we can 
discuss reliability profitably, but not validity. If we suppose that there 
is an esthetic criterion, it still seems likely that it is inaccessible; and we 
may as well restrict ourselves to the discussion of reliability. We may, 
however, if we wish use the average of the judgments as an estimate of the 
criterion. From the evidence of the examples, such criteria should be 
treated with considerable caution; and it is rash to conclude with Mr. 
Eysenck that “‘the criterion will simply be the average judgments of a 
large number of judges.”’ 

It is legitimate, however, to regard the average judgments as descriptive 
of the group from which they are derived. In this way groups may be 
compared, and where there are differences that each group will seem biassed 
by comparison with the other. This method might be used for instance to 
compare the preferences or fashions in different groups; but there is no 
implication in the procedure that one group is right and another wrong. 

A statistical difficulty that has made its appearance throughout this 


enquiry is that if —= is less than unity, there is evidence of bias or sys- 


nk 
tematic error. It is certain, however, that both terms are liable to errors 
of sampling, and it would be most desirable to have a method for deter- 
mining whether an experimental value of this expression differed signifi- 
cantly from unity. While the standard errors of both terms are known, 
it seems likely that some other method should be used. Reasons for saying 
this are that the two terms may not be independent as the ratio approaches 


Vmikx 
Vi + (m — 1)? 


Seems likely, by reference to the Spearman-Brown formula, that as m in- 
creases, 7, will tend to increase), and while standard errors are suitable for 
determining the significance of deviations from zero, it is very doubtful 
whether they can be used for values approaching unity. 


unity (it is clear that cannot be greater than 1, and it 


SUMMARY 


On the theoretical side: 

1. It is pointed out that the formula quoted by Mr. Eysenck is only an 
approximation. 

2. The approximate form is legitimate only when the two factor theory 
is applicable; the closeness of #« to V?xx is valuable evidence on this point. 
To assume the equality of 7.x and V?,, without proof as Mr. Eysenck does 
is to beg a fundamental question. 


#4 


— 
yee’ 
% 
fe 
1 
> yet 


426 BERNARD BABINGTON SMITH 


On the practical side: 

3. Gordon’s data are re-examined, and it is shown that there is very 
slight evidence of bias or systematic error. 

4. Three examples are given: 

a. ‘Lifted Weights’; this confirms Gordon’s results, and shows even less 
evidence of bias. 

b. ‘Lifted weights of different sizes and shapes’; it is at once apparent that 
the average of the judgments gives a result grossly in error. 

c. Judgment of intelligence from photographs; the average of the judgments 
shows a small positive and significant correlation with the Binet I.Q., 
the judges agree to a significant degree, but no increase in the number 
of similar judges would allow the correlation with the Binet I.Q. to 
rise higher than approximately .25. 

5. These experiments show that errors in human estimates do not 
necessarily cancel out. 


Two more general conclusions are: 

6. In the absence of a known criterion and knowledge of the effect of 
bias of various descriptions, it is unsafe to discuss validity. Reliability of 
group judgments may be calculated in the form of the average inter-_ 
correlation between judges, and the average of the judgments may be taken 
as descriptive of the group or as an estimate of the criterion but not as 
constituting the criterion. 

7. It is highly desirable that some method should be devised for assess- 


ing the significance of the difference between Lt and 1. The behavior 
hk 


of this ratio as m increases seems to need further investigation. 


(Manuscript received October 25, 1940) 


REFERENCES 


1. Bruce, R. S., Group judgments in the fields of lifted weights and visual discrimination, 
J. Psychol., 1935, 1, 117-121. 
2. Burt, C., Marks of Examiners, London: Macmillan, 1936. 
3. Eysenck, H. J., The validity of judgments as a function of the number of judges, J. exp. 
Psychol., 1939, 25, 650-654. 
4. Fisner, R. A., ann Yates, F., Statistical Tables for Biological, Agricultural and Medical 
Research, Edinburgh: Oliver & Boyd, 1938. 
5. GaskELL, P. C., Fenton, N., AND Porter, J. P., Judging the intelligence of boys from 
their photographs, J. appl. Psychol., 1927, 11, 394-403. 
6. Gornon, K., Group judgments in the field of lifted weights, J. exp. Psychol., 1924, 7, 398-400. 
7. Gorvon, K., Further observations on group judgments of lifted weights, J. Psychol., 1935, 
I, 105-115. 
8. Hutt, C. L., Aptitude Testing, Yonkers: World Book Co., 1928, p. 262. 
9. M. G., Basincton Situ, B., The problem of m rankings, Ann. math. Statist., 
1939, 10, 275. 
10. Preston, M.G., Note on the reliability and validity of the group judgment, J. exp. Psychol., 
1938, 22, 462-471. 
11. Smitu, M., Group judgments in the field of personality traits, J. exp. Psychol., 1931, 14; 
562-565. 
12. Stroop, J. R., Is the judgment of the group better than that of the average member of the 
group?, J. exp. Psychol., 1932, 15, 550-562. 


= 


REPLY: VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 427 


REPLY 


THE VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 


BY H. J. EYSENCK 
(From the Psychological Laboratory, University College, London) 


In a recent paper I attempted to show that the validity of esthetic 
judgments increases as the number of judges increases, in accordance with 
a formula first given by Burt for reliability or self-consistency: 


NP kk? 
Te, = (n 1)F ’ (1) 


where rz, stands for the correlation of the average order with the ‘true 
order,’ 7; for the average intercorrelation, and m for the number of persons 
correlated. Using 900 preference-rankings of 12 uncolored pictures, I first 
showed that the average order of a ‘standard’ or ‘criterion’ group of 700 
subjects correlated perfectly with the average order of an ‘experimental’ 
group of 200 subjects. Then, taking the 200 rankings singly and in groups 
of 5, 10, 20, and 50, I showed that as the size of the group increased, so the 
correlation of their average order with the true order, 1.¢. the order given 
by the ‘standard group,’ also increased as predicted by the formula (1). 

This demonstration has been criticized by Mr. Babington Smith on 
four main grounds. He maintains (1) that the equation used was only an 
approximation, and that the full formula should have been used; (2) that 
the approximate equation depends on the applicability of the two-factor 
theory, and that this theory was not shown to apply; (3) that in any case 
the formula deals with reliability, and not with validity; and (4) that the 
results of certain experiments described by him disprove a claim he attrib- 
utes to me, namely that ‘‘a measure shown to be reliable will in the long 
run be perfectly valid.’ Points (1) and (2) will be dealt with together, as 
they are really inseparable; points (3) and (4) also belong together logically. 
One or two further criticisms of minor importance will be dealt with in the 
course of the argument.! 

It is of course quite true that the formula used is only an approximation, 
the full formula being given by Burt as: 


NP kg (2) 
fie = ; 
Nn + n(n — 
which reduces to (1) since 
Pig = Vix (approximately). (3) 
Now 
— 1) 
ke + (n 1 (4) 


n(n 1 )F ex 


1 My thanks are due to Mr. Babington Smith for his courtesy in letting me see his MS- 
before publication. I also want to record my gratitude to Professor C. Burt for kindly reading 
through and commenting upon the draft of this paper. 


Wwe. 

) 
tye 

f 

> 

Kn 

’ 

= 

: 

\ 


428 H. J. EYSENCK 


where 7, is the communality = Thy + Thy +++), Equation (4) 


reduces to 
Tee — Pex 
Tig = (5) 


and from equation (5) it is apparent that the approximation is due to the 
fact that? # but > #x. Substituting equation (5) in equation (2), we get 


+ (n — 1)F ex’ 
= 6 
Ne + (n — 1) 6) 


This formula enables us to form an idea of the size of the error intro- 
duced through the fact that an approximation formula only is used, and 
that more than one factor may be present. As Burt points out: “Since 
analysis by multiple factors is a process of averaging deviations about pre- 
ceding averages, the range of the correlations is reduced to rather more 
than half at each step; the treatment of the first factor as exclusively 
positive, instead of bipolar, produces the effect of missing a step” (2, p. 
358). Hence we will not go far wrong if we assume that in the great 


majority of cases fix. < 3/2rt,; particularly as the latter terms in the pro- 
gression indicated by Burt (1 + 4+ 4 .---) are very likely to be omitted 
because of lack of statistical significance (2, p. 357). (As Davies has shown 
in an analysis of all published tables of correlations between persons, in 
only 4 out of 48 researches was even the second factor statistically signifi- 
cant, while the third factor never reached the required level of significance 


(3).) We must also take into account the fact that ri, > fi (if only 
slightly), but on the whole we may perhaps reasonably assume that in 
actual practice the value of 7, will seldom if ever be larger than 27,4, and 
in the majority of cases a good deal smaller.’ 

Taking fix = 27,4 as the probable maximum value, and substituting 
this in equation (6), we get 


(n + 1)F ee ) 
+ (n — 1)F ee 
and hence the probable maximum error in taking ?ig = Vike iS 
(n + 1)F NT kk? (8) 
I+ — 1)F I+ (mn — 


2 The value actually occurring in the experiment criticized is of course much smaller than 
this. As evidence for this contention we may cite the fact that the errors actually observed 
(i.¢. the differences between the theoretical and actual values in Fig. 1 in my original article) 
are not even half the maximum error. Seeing that the inaccuracy introduced through the use 
of the approximate formula and through the presence of any second, third, etc. factors only 
influences the result by changing the values in the leading diagonal of the correlation matrix, 
we may remember Thurstone’s observation that “the diagonal entries . . . may be given 
any value between zero and unity without affecting the results markedly, especially when the number 
of variables is large” (4, p. 108. My italics). 


thy 


REPLY: VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 429 


which reduces to 


+ 1) — 2Vn? + nn] (9) 
I+ (nm — 


From equation (9) it is then possible, provided we know the average inter- 
correlation, to calculate the probable maximum error which can be intro- 
duced by using the approximate equation (1) instead of the full formula (2). 

If we have the rj, value calculated by the approximate equation, then 
equation (9) can be rewritten in the following form which will be more con- 
venient for ascertaining the maximum amount by which the value of ri, 
from the approximate equation is likely to be increased: 


n 


Max. Error = \" 


In Table 1, below, are given the maximum errors so calculated for 
various numbers of persons and for various values of ri,, as calculated by 
the approximate formula. This table may be of interest as showing the 
maximum amount of error to which anyone using Table 2 in my original 
paper would be liable in the ordinary course of investigation. 


TABLE 1 
Number of Rankings: 

Tho: 
5 10 20 50 200 
10 fo) .0O .0O 00 .00 
.20 .02 .0O 00 .00 
30 03 oI Ol .00 .00 
.04 .02 .00 .0O 
50 05 02 ol 00 .0O 
60 .06 03 Ol Ol .00 
70 .07 03 02 ol .00 
80 .08 O04 02 Ol 
go -O9 204 Ol -00 


There are several points which deserve mention in connection with the 
preceding argument. The first is that the maximum error introduced by 
using the approximate formula is small (except when only 5 subjects are 
used; but cf. below). In the analysis of results such as one is likely to 
obtain in the ordinary course of investigation the actual errors found are 
of the order .o1. Thus in five experiments carried out by the present writer 
in an endeavor to strengthen and make statistically significant the second 
factor (thus increasing the amount of error over the usual), the following 
results were reached on the average: 


TABLE 2 


tg = .382 Pik = .494 fig = .942 (approximate formula) 
= Fixe = Tig = .955 (full formula) 
N = 15 Error = .013 


) ) 
)) 
y 
it 
n 
n 
* 
y 
n 
g 
) 
’ 
) 
d 
n 


430 H. J. EYSENCK 


Although the error involved in using an approximate formula may only 
be small, theoretically of course it is always preferable to use the exact 
formula. In practice, however, one must compare the gain in accuracy 
with the amount of extra work involved in the use of the full formula. 

In the case of the experiment criticized by Mr. Babington Smith, it 
would have been necessary to calculate 19,900 correlations, and to factorize, 
by a special iterative procedure, 40 tables containing 10 correlations each, 
20 tables containing 45 correlations each, 10 tables containing 190 corre- 
lations each, 4 tables containing 1,225 correlations each, and 1 table con- 
taining 19,900 correlations.* It must be left to the reader to judge whether 
the possible gain in accuracy would have justified the amount of work 
required. 

The true position seems to be this: the errors involved in using the 
approximation are appreciable only when v is very small. But rather than 
attempting to derive accurate data from such very small samples by means 
of refined statistical procedure—an illusory accuracy in any case, because 
of sampling errors—it would be better to increase the number of subjects, 
and thus decrease the size of the error involved in using the approximate 
formula. The number of subjects need not be unmanageably large to 
achieve this object—as Table 2 shows, even with 15 people the average 
error involved was only .o1 in 5 experiments. We would agree with Mr. 
Babington Smith, however, that in cases where for some reason or other 
only few subjects were available, or in cases of particular theoretical interest, 
where accuracy is of supreme importance, the exact formula should be used, 
rather than the approximation. 

Two minor points in this connection may be of interest. The errors 
introduced through the use of the approximate formula are on the con- 
servative side; they underestimate the correlation with the ‘true order.’ 
Thus when Table 2 of my original paper is used, as I suggested, to ‘‘enable 
the investigator to judge at a glance which way the results are tending, or 
how many subjects to use,” he will always be on the safe side by following 
the guidance of the table implicitly. Secondly, it would have been quite 
impossible to construct a table of this kind by using the full formula, as 
the full formula contains too many unknowns. 

The second major point in this discussion is connected with Mr. Babing- 
ton Smith’s second criticism. It will be clear to those who have followed 
the mathematical argument that any factor other than the first, general, 
factor can influence the result only by increasing the value of Fy, 1.¢. by 
increasing (very slightly) the amount of inaccuracy introduced through the 
use of the approximate formula. (This effect has been taken into account 
in equation (7) and the subsequent discussion.) ‘That means that if the 
two-factor theory does not hold, our results are affected quantitatively; 


3 Mr. Babington Smith, in a private communication, has objected to this argument because 
he maintains that by following his procedure it is unnecessary to calculate all these correlations. 
But that is not really material to the argument; as presented by Burt, the full formula does 
require calculation of these values, and later developments were not available when I made 
my decision as to which of his two formulae to use. It is this decision which Mr. Babington 
Smith has criticised. 


+ 
4 


REPLY: VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 431 


Mr. Babington Smith seems to suggest that it affects the result qualita- 
tively, by making the formula altogether inapplicable. 

Thus we conclude that the formula used was an approximation, and 
as an approximate formula applicable whether or not the two-factor theory 
holds. It is admitted that when very few subjects are used in an experi- 
ment, it is advisable to use the full formula rather than the approximation, 
although even then it would seem preferable to increase the number of 
subjects rather than to attempt to reach a rather spurious accuracy by 
means Of fuller statistical calculations. As the number of subjects increases, 
however, the errors involved decrease rapidly whether the two-factor theory 
holds or not; hence when dealing with 15 or more subjects the use of the 
approximate formula would seem quite justified, especially as the errors 
introduced are on the conservative side. 

Mr. Babington Smith’s third criticism raises a point which is even more 
important than those already considered, and will probably be of wider 
interest. He maintains that “Mr. Eysenck’s formula deals with reliability 
and not with validity,” and appears to consider unfounded my claim to 
have shown that the validity of esthetic judgments increases as the number 
of judges increases. 

We can meet this criticism in two different ways, firstly as applied to 
our experiment, and secondly, as applied to the general case. Validity is 
defined as the correlation of some test with an accepted outside criterion, 
and although such a criterion was provided in our experiment that would 
not always be so. In the case of our experiment, the average order pro- 
duced by the ‘standard’ group of 700 subjects was the criterion against 
which the orders of the 200 subjects in the ‘experimental’ group, singly 
and in groups, were validated. This is what Hull would call a ‘subjective 
judgment criterion,’ which according to him takes its place beside the 
‘product’ and the ‘action’ criterion (5, p. 375). The critic may possibly 
doubt the soundness of this criterion; as Guilford points out ‘‘students of 
esthetics especially are often of the opinion that the combined judgments 
of the masses should count for little as compared with the judgments of a 
single ‘expert’ ”’ (6, p. 259). It seems, however, that such a criticism is 
hardly tenable in the face of the results of experiments reported by Dewar 
(7), Bulley (8), Semeonoff (g) and others who found a very strong tendency 
for the average judgments of large numbers of laymen, even of very young 
children, to agree perfectly with the judgments of experts. (Fechner also 
comments upon this effect.) In terms of the theory of the general factor 
of esthetic appreciation, as outlined by Burt (10), we would say that the 
expert is more highly saturated with this general factor than the layman is 
likely to be, but that we are dealing with the same factor in both cases. 
Hence the judgment of a group cf 700 people may be regarded as at least 
as good as that of a group cf 10 experts, say, and probably as better. 

But as pointed out above, this case is rather exceptional, and in general 
such an outside criterion will not be available. It is possible even then, 
however, to argue with Guilford that “the determination of the validity 
of test items . . . may be carried out without the use of an outside criterion.” 
As he points out, ‘‘ Especially with personality tests, for which it is difficult 


< 


a 

y 

t 

y 

J 

> 

> 

) 

> 

- 

) 

J 

> 

x. 

4 

| 

met 


432 H. J. EYSENCK 


to find a valid objective criterion, it has been customary to let the test become 
its own criterion.” (6, p.451. My italics.) That is precisely what would 
be done in the ordinary case to which I would suggest applying Burt’s 
formula: the criterion would be derived from the judgments themselves. In a 
precisely analogous fashion, ‘g’ was derived from the intercorrelations of 
so-called tests of intelligence. It is then open to us to identify ‘g’ with a 
term of common reference, such as ‘intelligence,’ just as more recently ‘T,’ 
the general factor of esthetic judgment, has been identified with ‘good 
taste’ (11). In this way we arrive at an exact, operational definition of 
such vague everyday terms as ‘beauty’ and ‘intelligence’ which in their 
ordinary form are useless for scientific discussion. This suggested use of 
the word ‘validity’ has good authority on its side, and seems to avoid both 
the extreme views advanced in this sphere: the ultra-conservative view of 
Babington Smith, who would restrict the use of the term unduly, and the 
rather too revolutionary view of Carr, who would abolish all differentiation 
between reliability and validity (12). 

Guilford goes on to say that the chief criticism against this procedure, 
as applied to personality tests, is that we do not know what the real dimen- 
sions or variables of personality are. Such a criticism can hardly be levelled 
against this procedure in the field of esthetics, as due largely to the work 
of Professor Burt and his students the fundamental bases of esthetic judg- 
ments are no longer unknown. (In the field of temperament, the position 
has also improved considerably, due mainly to the important work done 
by Guilford himself in his search for Personality Factors.) As regards 
esthetics, it would appear that first of all we are dealing with a general 
factor which enters into our judgments of painting, poetry, sculpture, 
designs, music, prose, photography, colors, and even odors (10, 11, 13). 
Secondly, we deal with bipolar ‘type’ factors which seem to be closely 
connected with emotional and temperamental characteristics (14, 15, 16). 
Apart from these factors which go to make up the ‘communality’ of each 
person’s judgment we have the ‘unique’ specific and error factors which 
characterize one person at all times (specific) and one person at any one 
particular time (errors). Associations, familiarity, etc., are such specific 
factors; mood, environment, etc., are error factors. 

Of all these factors, it is only the general factor that is of importance 
for our argument; the other factors will cancel out in the long run, as 
Fechner pointed out long ago (Vorschule, 1, p. 194). In attempting to 
derive the criterion from the judgments themselves, we are trying to do 
implicitly what has been attempted elsewhere explicitly (17): to make a 
psychological analysis of the bases of judgment actually employed by the 
subjects, 1.¢. of the general factor. Such an attempt is similar in its aim 
to Professor Spearman’s famous analysis of ‘g’ into his neogenetic laws, 
although it cannot of course be claimed that it has advanced as far towards 
its goal. 

This leads us to the last point made by Mr. Babington Smith. He 
maintains that ““when Mr. Eysenck’s claim is restated in the form that 
‘a measure shown to be reliable will in the long run be perfectly valid’ it is 
more easily seen to have over-reached itself,” and goes on to claim that 


— 
x 
tem 
2 
a 
4 
Ny 
ind 


REPLY: VALIDITY AND RELIABILITY OF GROUP JUDGMENTS 433 


‘‘examples are given . . . where there is a significant and even high degree 
of agreement between judges with respect to rankings the sum of which 
differs from the criterion.”” Now properly qualified Mr. Babington Smith’s 
statement is a correct interpretation of my view. In my original paper 
] said that “one great difficulty in experimentation of this kind is that the 
criterion . . . is not given externally, as in the case of the weights, but has 
to be deduced from the experimental data themselves” (1, p. 650). This 
qualifies the general statement against which Mr. Babington Smith argues, 
and restricts its application merely to cases in which the criterion has to 
be deduced from the experimental data themselves. In cases of this kind 
it is indeed true that reliability and validity do come to the same thing, 
hence in this restricted sense I agree with Mr. Babington Smith’s statement. 
But all his experiments do contain an external criterion, such as the weight 
of the containers, or the I.Q. of the person whose picture is judged for his 
intelligence, and hence it cannot be admitted that they have any bearing on 
the discussion, however interesting they may be in themselves. 

Mr. Babington Smith has objected to this view on the grounds that 
until the exact weights (in his weight lifting experiment) are known the case 
is exactly similar to those cases in which there is no external criterion, and 
that therefore the suggested differentiation must break down. That is not 
correct. We deal roughly with two different classes of judgments: On the 
one side, we have those where an ‘objective,’ outside criterion is either 
available or possible; judgments of weights would be included here. On 
the other side, we have judgments where the criterion can only consist, or 
be derived from, the judgments themselves; judgments of beauty would 
come under this head. For if, with St. Thomas Aquinas and most modern 
psychologists, we define the beautiful as ‘“‘Id cuius ipsa apprehensio placet,”’ 
the apprehensio is clearly the only criterion which we can have, by the very 
nature of the case. This apprehensio is expressed as a judgment, and a 
study of these judgments is the only possible way of studying the beautiful. 
We must of course take care in our experiments that it is really the ipsa 
apprehensio which pleases, and not some outside effect, such as prestige 
value (“What a lovely picture—it must have cost a lot of money’’). Thus 
the difference between the two classes lies in the possibility, not in the 
availability of an external criterion. 

The same argument applies to Mr. Babington Smith’s contention that 
his experiments prove that “errors in human estimates do not necessarily 
cancel out.’” This is not a novel statement, and it does not contradict 
what I myself have said. My claim was that chance errors tended to cancel 
out in the long run; Mr. Babington Smith is dealing with systematic errors. 
A systematic error in factor analysis would form part of the communality 
of the test (or of the person, when persons are correlated); chance errors 
would form part of the uniqueness of the test (or the person). But while 
a systematic error would indeed be an error when we are dealing with an 
outside criterion, we can hardly regard it as in any sense an error when 
dealing with something that has no outside criterion. There it would be 
of the nature of a ‘type’-factor (15) or of a ‘group’-factor (18); nobody 
would regard the verbal group-factor as a systematic error, for instance! 


4 


| | 
ld 
of 
a 
’ 
id | 
»f 
ir 
»f 
h 
yf 
e 
n 
d 
k 
e 
. 
) 
] 


434 H. J. EYSENCK 


Thus the only errors we deal with in our analysis are chance errors, i.¢. 
errors which cancel out by definition. 

It would be interesting to follow up further the very suggestive criti- 
cisms brought forward by Mr. Babington Smith, but we must forbear doing 
so, and can only refer the reader to Burt’s extensive discussion of these and 
similar points (10). Mr. Babington Smith’s objections to our view seem 
to be based fundamentally on a philosophic view which regards esthetic 
judgments as subjective; indeed, Mr. Babington Smith would extend this 
view to ‘most other judgments.’ As this is a philosophic question, there 
can be little gain in pursuing it any further here; in fact, like so many other 
philosophic questions, it would appear to be mainly a question of definition, 
not of fact. On the facts, there can, I think, be little dispute, and it is to 
be hoped that agreement may be reached on this basis. 


(Manuscript received April 9, 1941) 


REFERENCES 


1. Eysencx, H. J., The validity of judgments as a function of the number of judges, J. exp. 
Psychol., 1939, 25, 650-654. 
2. Burt, C., The Factors of the Mind, London: Univ. of London Press, 1940. 
3. Davies, M., The general factor in correlations between persons, Brit. J. Psychol., 1939, 
29, 404-421. 
4. Tuurstone, L. L., Primary Mental Abilities, Chicago: University of Chicago Press, 1939. 
5. Hutt, C. L., Aptitude Testing, World Book Company, Yonkers, 1928. 
6. Guitrorp, J. P., Psychometric Methods, New York: McGraw-Hill Book Co., 1936. 
7. Dewar, H., A comparison of tests of artistic appreciation, Brit. J. educ. Psychol., 1938, 8, 
29-49. 
8. Buttey, M., Have you good taste?, London: Methuen, 1933. 
g. Semeonorr, B., Further developments in a new approach to the testing of musical ability, 
Brit. J. Psychol., 1940, 31, 145-161. 
10. Burt, C., The psychology of art (In: How the Mind Works, by C. Burt et al.), London: 
Allen & Unwin, 1933. 
11. Eysencx, H. J., The general factor in esthetic judgments, Brit. J. Psychol., 1940, 31, 94-102. 
12. Carr, H. A., The reliability vs. the validity of test scores, Psychol. Rev., 1938, 45, 435-440. 
13. WituiaMs, E. D., et al., Tests of literary appreciation, Brit. J. educ. Psychol., 1938, 8, 265-284. 
14. Burt, C., The factorial analysis of emotional traits, Char. and Person., 1939, '7, 238-254; 
285-299. 
15. Eysenck, H. J., ‘Type’-factors in esthetic judgments, Brit. J. Psychol., 1940, 31, 262-270. 
16. Eysenck, H. J., Some factors in the appreciation of poetry, and their relation to tempera- 
mental qualities, Char. and Person., 1940, 9, 160-167. 
17. Eysenck, H. J., The empirical determination of an esthetic formula, Psychol. Reo., 1941, 
48, 83-92. 
18. Eysencx, H. J., Critical notice of Primary Mental Abilities, by L. L. Thurstone, Brit. J. 
educ. Psychol., 1939, 9, 270-275. 


A 


